Tuesday, April 1, 2014

Sorting And Efficiency

Sorting data structures is one of the most frequently used operations in computer science…. Operations such as sorting a sequence of words in alphabetical order, or sorting a sequence of numbers in increasing order are just basic examples that illustrate how vast the purposes of sorting could be.
We have seen some algorithms that allow the implementation of this process successfully so we will discuss the differences between them.

When we are sorting, the first question we should ask ourselves is: What is the most efficient algorithm? In other words, what is the algorithm that will return the sorted data structure by executing the least possible number of steps?

Every function can be bounded above by another function. This upper bound is called Big-Oh. It shows a bound on the worst case running time for a given function.

First of all, we will talk about the sorting algorithms that are typically executed more inefficiently. These are bubble sort, selection sort and insertion sort. They are bounded above by O(n^2). I will assume that if you are reading this entry, you probably have a sense of how these algorithms are implemented. So let’s talk about why they are usually the slowest implementations of a sorting operation.
- Bubble sort makes n-1 comparisons during the first pass on a list containing n elements. Then, we will reduce by 1 the number of comparisons evaluated by the function for each pass. This is why it’s such an inefficient algorithm (it makes something less than n-1 comparisons n number of times…); the elements are not placed in their final location when they are manipulated (except for the largest one in the unsorted part of the list).  The passes keep taking place until the list is finally sorted. However, we should point out that using bubble sort is the most efficient way to know if a list is already sorted since it won’t swap any item on the first pass and we will know that the structure is sorted! (which is O(n)!)
- Selection sort is also a very inefficient algorithm. We will need to traverse the whole list n-1 number of times to find the largest (or smallest) item and put it in its right location. Again, we can see intuitively that this is O(n^2).
              - Insertion sort also takes n-1 passes but it only needs to compare the element being evaluated to the sorted part of the list (which has less than n elements). This is also O(n^2). But note that in the best case (already sorted list), each item only needs one comparison so we will get O(n)!

Now we will talk about the algorithms using a divide and conquer strategy. We say “divide and conquer” because we divide the list being passed into smaller sublists in order to improve the performance of the program. At the end, we will “conquer” or bring these pieces together. This picture shows how the process works:
- The merge sort algorithm recursively splits the given list in half. Then we place the smallest items of each sublist in the right location of a bigger list. At the end we will get the sorted list. The “dividing” step of this algorithm is O(log n) because we get smaller sublists each time the recursive step is applied. The “conquering” or “merging” step evaluates each item once. So it’s O(n). Putting these pieces together we can see that the merging algorithm is O(n log n), which is considerably faster than the algorithms we talked about previously.
              - Quick sort is usually faster than merge sort. However, it’s remarkably slower on its worst-case run-time. If the designed pivot value in each pass is always either the biggest or smallest element in the list, then the function will execute one whole pass to locate a single element in the right location each time. This, is O(n^2), we compare n-1 items n times.

We should also make the point about Timsort. This is Python’s built-in sorting algorithm. It's typically O(n log n) and performs faster than any of the algorithms previously stated (as we’ve seen in lab 10). This video will clarify the operating process of this algorithm: https://www.youtube.com/watch?v=jVXsjswWo44


Summarizing, we’ve seen which sorting algorithms are typically the fastest and which ones are the slowest. But again, we should always consider what is it exactly that we want to do with our program and what is the input that we are going to pass to it since sometimes an algorithm that is usually slow could be really efficient in a specific context and vice-versa.

Bibliography:
picture 1:
http://www.wikihow.com/images/2/26/Sort-a-Deck-of-Cards-Step-1.jpg
picture 2: picture: http://www.cise.ufl.edu/research/ParallelPatterns/PatternLanguage/AlgorithmStructure/Figures/d-and-c.gif
video: research task by Mark Lee.

No comments:

Post a Comment