Sorting and Efficiency
For this assignment, you will experiment with two different sorting
algorithms, selection sort and merge sort.
To begin, you will need to copy the file Sorts.h, which
contains definitions of the
SelectionSort and MergeSort functions.
- Write a main program named sortnums.cpp that can be used to
compare the performance of these two
algorithms. Your program should
prompt the user for the number of items to be sorted, then construct a vector
of that many random integers. It should then perform each of the sorts on
that vector and time how long it takes (using the clock function from
the ctime library). The
time for each sort should be displayed in a readable format.
- Perform 3 different executions of your program on each of the following
list sizes: 500 items, 1000 items, 2000
items, and 4000 items. Report your timings in a table such as the one below.
| | 500 items | 1000 items
| 2000 items | 4000 items
|
|---|
|
SELECTION SORT:
1st timing:
2nd timing:
3rd timing:
|
MERGE SORT:
1st timing:
2nd timing:
3rd timing:
|
- For each sorting algorithm, how consistent are the timings? That is, when
you run the algorithm on three different lists
of the same size, are the timings the same? Should they be? Explain your
answer, and describe any patterns
concerning the consistency of the timings.
- Recall that selection sort is an O(N2) algorithm, whereas merge
sort is O(N log N). This means that
it takes "on the order of" N2 operations to sort a list of N items
using selection sort, whereas it takes
"on the order of" (N log2 N) operations using merge sort. As such,
merge sort tends to be the faster
sort. Do your timings support this claim?
- Recall that for an O(N2) algorithm such as selection sort, the
number of operations grows proportional
to the square of the list size. So, if you double the size of the list, the
number of operations increases by a
factor of 22 = 4. Conversely, for an O(N log N) algorithm such as
merge sort, doubling the list size
increases the number of operations by a factor only slightly larger than 2.
Do your timings demonstrate these rates
of growth? Explain.
- One attractive property of insertion sort is that it performs more
efficiently when the list is already
partially sorted. Augment your program so that it will allow you to test
whether
selection sort and/or merge sort share this property. In
particular, your modified program should time each sort on the same random
list (as before), then retime them on the
sorted version of the list. Is there a significant improvement when the list
is already sorted? Would you expect
there to be any? Explain your answers.
- If you look closely at the implementations for the sorting algorithms, you
will note that there are two different
manners in which the algorithms access the vector: comparisons and swaps. For
example, selection sort repreatedly
traverses the vector, comparing each data value with the smallest so far, and
then swaps the smallest value into the
correct position. Likewise, when merge sort merges two sorted lists, it
repeatedly compares values from the lists
and swaps the smallest into a temporary array (and eventually back into the
original array). When dealing with
integer values, the cost of comparing and swapping are somewhat comparable.
If the data to be compared/assigned is
complex, however, the time required to swap values can far exceed the time
required for comparisons. For example, if
the vector contained long strings, you would expect the time required for
swaps to be more significant.
Write a second main program named sortwords.cpp that is similar to
sortnums.cpp except that it
sorts vectors of random strings. After prompting the user for the size of the
vector, the program should generate
random letter sequences of length 20 (although this length should be easily
changeable) and sort them using selection
sort and merge sort. Perform repeated executions of your program to complete
a table similar to the one in part 2.
Which of the algorithms is most affected by increasing the size of the objects
being sorted? Does this make sense
when you consider the number of comparisons vs. swaps for each algorithm?
Justify your answer.