Sorting is the process of taking a number of items, and placing them in increasing or decreasing order. The act of sorting is one of the most commonly used algorithms in a computer. In fact, since you switched your computer on it has probably performed many sorts already for various reasons. It is also a problem which has been studied widely for a very long time, and much is known about the many different ways of performing the task. There are a huge number of algorithms when it comes to sorting, and many websites will only tell you about a few of them. The aim of this article is to be a source of information about as many of these algorithms as possible.
There are a number of considerations however, when it comes to sorting.
Sorting all at once vs. Incremental Sorting:
The first thing to be aware of with sorting is that even when you want sorted data, blatantly sorting all of the data at once in isn't always the best choice. There are other options such as inserting items into an already sorted array or list as they are read in or better yet, maintaining items in an ordered binary tree. Sometimes this isn't practical, such as if your target device does not have enough RAM to store all the extra pointers required for a tree. Or perhaps you need a better than O (n log n) running time. Or maybe you insert and remove far more often then you need the data in sorted order, and you don't want expensive insertions.
Data Structure (Random access vs. Sequential access):
What structure is the data in before and after the sort? Sorting can be done on arrays, doubly-linked-lists, singly-linked-lists, and circularly linked-lists to name a few structures. All of these data structures can be sorted efficiently, but the algorithm which suits best for one data type won't necessarily be the best for another data type.
Internal vs. external sorting:
You've decided that you need/want to sort, that's great, now do you have all the data available at once? Will it fit in RAM? If you can't hold it all in RAM, then you'll need an external sorting algorithm. If your data hasn't all arrived yet (e.g. over a network), and you want to start sorting before it all arrives, then some algorithms may be out of the question.
In-place vs. making a sorted copy:
Often times, we have the data and we just want it sorted, but sometimes we want to keep the data in the order it is, but just access them in sorted order, or produce a copy of the data which is sorted. Some algorithms naturally work in place as they require no extra memory, but some algorithms will give you a sorted copy, or at least a list of indexes which index the original items in sorted order.
Parallel vs. Sequential:
What resources do you have available to you for the sorting? If you have multiple CPUs or CPU cores available and you are sorting enough items to be able to benefit from using multiple CPUs, then you may be able to use parallel sorting with a “Divide and conquer” style algorithm. For the moment I'll assume that you don't have that luxury, or don’t want to do any threading anyway. Multiple core CPUs are becoming quite popular at the moment though.
Stable vs. Not Stable:
Is it possible to have duplicates in the data you want to sort? For example a list of ping times from various servers would be likely to have duplicates, but number plates wouldn't. Some algorithms such as pigeonhole sort will not work with duplicates at all. Some algorithms will work with duplicates, but might possibly place the items of equal value in any order.
Stability is the term used to describe whether the order of items with equal values is preserved by the sorting algorithm. A sort which does not reorder items with equal keys is said to be stable. The term has nothing to do with whether the algorithm might sometimes fail to sort correctly, or might crash the program (which no properly implemented sorting algorithm should ever do). Knowing whether or not you need a stable sorting algorithm is somewhat important. For example early versions of the game UT2004 used a non-stable sorting algorithm when displaying the list of servers ordered by ping time. As more servers were pinged, the results were added to and the list resorted. But any servers with equal pings would keep switching order upon each subsequent sort, making it hard to click on the right one, presumably because the quick sort algorithm was used. This was later corrected, presumably by using a stable algorithm. There are also times when you have duplicates, but you don't really care about the order that they appear in the sorted data set.
Big-Oh Notation:
At time you will notice that I
refer to an algorithm being O(n) or O(nlogn), or having O(n) memory usage. This
is known as Big-Oh
Notation. In sorting, n always refers to the number of items being sorted.
The following Big-Oh notations are ordered from slowest-growing (good) to
fastest-growing (bad):
O(1), O(logn), O(n), O(nlogn), O(nlog2n), O(n2), O(2n),
O(n!).
See also Searching.