More Array Sorting

by Malcolm Phillips

< Back to other Array Sorting Techniques

Tree Sort Techniques

Binary Tree Sort
Well, Exchanging, Inserting and Merging techniques have already provided plenty to talk about. But wait, we're only just getting started! One conceptually easy way to sort items is to insert them into a binary tree, and then read out all the items using an in-order traversal. This is known, of course, as Tree sort. Beginners often mistake this for an O(n) sorting algorithm, because they fail to take into account two things:
The tree grows as you insert more items, meaning that future inserts take longer, and
The tree might degenerate into a linked-list, if for example, items are inserted in sorted order.
It is in fact O(n2), but with some luck in the order items are inserted, it can run in O(nlogn) time.

Tree Sort is stable.

[show source code]

http://en.wikipedia.org/wiki/Binary_tree_sort
http://www.nist.gov/dads/HTML/treeSort.html

Randomised Binary Tree Sort
The problem with Tree Sort is that the tree can be rather unbalanced, especially if the data was already sorted. Such a case would be really easy for many algorithms, but alas makes Tree Sort as bad (if not worse) than Insertion Sort. The fix is to use some kind of balanced tree. There are a number of self-balancing tree algorithms out there, or other means of ensuring a tree is reasonably well balanced.
Although we know that tree sort performs poorly when there is a lot of order in the data, it performs reasonably well when the data is quite randomised. So we could either insert items in random order, or we could insert them into a randomised binary search tree. The approach I've taken here is the former.
Randomised Binary Tree Sort is stable.

[show source code]

http://www.cs.princeton.edu/~rs/strings/paper.pdf

Splay Tree Sort / AVL Tree Sort / Red Black Tree Sort / AA (Anderson) Tree Sort / Scapegoat Tree Sort
Random insertion might be better than ordinary insertion, but it's still possible to have it perform poorly, however unlikely that may become. The principle for all balanced binary tree sorts is the same as for a normal binary tree sort except that they in turn employ other algorithms to keep the tree balanced, and some may require additional memory to do so. Here are a bunch of four common self-balancing tree sorting algorithms.
Tree Sorts are stable.

Splay Sort and Scapegoat Sort are of particular interest in that it doesn't require any extra variables on top of what is used in an ordinary tree sort. The others store extra data in each node. Splay Sort adapts extremely well to data containing various patterns, but other tree sorts give more uniform sort times. Actually when Splay-sorting a doubly-linked list, the previous and next pointers of the doubly-linked list can be reused as left and right nodes whilst in the tree, meaning that no extra memory is required for the sort.

[show source code]

http://en.wikipedia.org/wiki/Binary_tree_sort

Heap Sort Techniques

Heap Sort
A heap is a data structure designed to allow constant time access to the maximum item (or minimum, but not both), without using any extra memory. The problem with tree sorts is that the trees themselves take up a lot of memory. But hang on; we can store a tree implicitly in an array without the need for pointers or rebalancing. Here's how we do that:

The first item is the root.
The next two items are the children of the root.
The next four items are the children of the root's two children.
etc.

Now the math to go between items and their parent's and children:

First child = parent*2+1.
Second child = parent*2+2.

However we'll store items in the order of a heap rather than a tree, because an ordered tree wouldn't really help much. In a heap, the parent value is not between the values of its children (unlike an ordered binary tree). Instead, the parent is greater than either of its children (or small in a min-heap).
Heap Sorts are NOT stable.

[show source code]

http://en.wikipedia.org/wiki/Heapsort
http://www.nist.gov/dads/HTML/heapSort.html

Fibonacci Heap Sort
Someone once had this crazy idea for nodes to have three children instead of two. To go to a child you multiply by 3 and add -1, 0 or 1. To go back to the parent, you add 1 then divide by 3. You need to start at 1 of course since 3 x 0 = 0. This modification adds a lot of complexity, but actually seems to give Fibonacci Heap Sort a speed advantage over regular Heap Sort.
Heap Sorts are NOT stable.

[show source code]

http://en.wikipedia.org/wiki/Fibonacci_heap

Weak Heap Sort
Some speed gains can be made by relaxing the heap condition such that only right nodes need to be less than their parent. It does mean that we need to store an extra bit of info for each node though. This also tends to beat ordinary Heap Sort, but at the cost of using extra memory.
Heap Sorts are NOT stable.

[show source code]

http://www.nist.gov/dads/HTML/weakheapsort.html
http://en.wikipedia.org/wiki/Heap_%28data_structure%29
http://books.google.co.nz/books?id=4SHZRSV5emYC&pg=PA39&lpg=PA39&dq=Weak+Heap+Sort+algorithm&source=bl&ots=5ZMm9KiqHB&sig=I9rFijVOxajxrdet-4jfdRuP3ns&hl=en&ei=u9GjS-fFOZiekQWMzcXJCA&sa=X&oi=book_result&ct=result&resnum=9&ved=0CC4Q6AEwCA#v=onepage&q=Weak%20Heap%20Sort%20algorithm&f=false

Smooth Sort
Dijkstra made a modification to Heap Sort which allowed it to run in O(n) time best case, for (almost) already sorted arrays. It's quite complex and slower than other heap sorting algorithms, much of the time. It is in fact too complex for me to describe here.
Heap Sorts (of which this is one) are NOT stable.

[show source code]

http://en.wikipedia.org/wiki/Smoothsort
http://www.nist.gov/dads/HTML/smoothsort.html

Beap Sort
Beap is short for "Bi-Parental Heap". In a beap, nodes have two children, and two parents. Other than that it is quite similar to a heap, and the algorithm itself is also very similar. Adjacent nodes share a child and a parent. This forms more of a grid structure as opposed to a tree structure. It has no practical usage due to being far less efficient than a binary heap. I've only really implemented it out of curiosity. Operations tend to take O(sqrt(n)) time. The sort itself is O(n1.5).
Beap Sort is NOT stable.

[show source code]

http://en.wikipedia.org/wiki/Beap

Quick Sort Techniques

Quick Sort

A quick way to sort items is to pick one of them and divide the rest into two partitions, one with items less than the item picked, and one with items greater than the item picked. Then you do the same with each partition, and finally join the splitter and the sorted partitions together. This is one of the many algorithms that uses what is known as the divide and conquer approach. The problem is divided into smaller sub-tasks, each of which is divided again in the same manner until the problem becomes trivial (there is only one item). The easiest way to pick a splitter is to pick the first item, however that is a horrible choice for when the list is already sorted or reverse sorted, as might often be the case. A fairly good and quick method is to pick the middle item of the array, which is what is done here.
Quick Sorts are NOT stable.

[show source code]

http://en.wikipedia.org/wiki/Quicksort
http://www.nist.gov/dads/HTML/quicksort.html
http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/quick/quicken.htm
http://www.sorting-algorithms.com/quick-sort

Randomised Quick Sort
Whilst the inner loop which partitions the items in two runs very fast, unfortunately picking the first item can have bad consequences. If the array is already sorted then one of the partitions will contain all of the remaining items, meaning that each partitioning step only reduces the workload by one, and performance becomes terrible. Picking the middle item can be just as bad depending on the order of the data. This can be improved in several ways. First off, you can try picking a random splitter. That way you'd have to be very unlucky to pick a very bad splitter many times over. Quick Sorts are NOT stable.

[show source code]

http://en.wikipedia.org/wiki/Quicksort#Randomized_quicksort_expected_complexity

Median-of-Three Quick Sort
Some people might not be too keen on using a random splitter. You never know when it might perform badly one time, yet quickly the next, sorting exactly the same sequence. Also with picking the middle item, you can still get poor results every now and then. A better alternative is to sample a few items and then pick the middle (median) value. This generally gives very good results, and makes hitting the worst case extremely unlikely. Quick Sorts are NOT stable.

[show source code]

http://en.wikipedia.org/wiki/Quicksort
http://www.people.carleton.edu/~grossm/

Mean Sort
The key to good Quick Sort performance really is in picking good splitter values. If we're prepared to spend a little time to first look at all the values, then we can find a close to ideal splitter. All we need is to find the minimum and maximum value and pick halfway between, thus using an artificial cut-off value. This is what I have called as Mean Sort. The drawback, besides having to iterate over the array first, is that it assumes that the distribution of the items is roughly linear. If it is not, then this wont work well.
Mean Sort is NOT stable.

[show source code]

Proportion Extend Sort
The main problem with Quick Sort is when the chosen splitter is not close to placing half of the items on either side. In this algorithm the median technique is again used, however the median is chosen from more than 3 items. In order to pick the median of more than three items, you have to select m items, and then sort those items and choose the one in the middle. This brings us back to the original problem of sorting some number of items, in order to do that. Since we're picking a median from a smaller number of items than we actually want to sort overall, we do the logical thing of using a recursive call. So in order to pick the median we first have to pick the median of a smaller number of items, and to do that we may have to pick the median of an even smaller number of items etc. Typically the number of items m for picking the mean is n/16. This algorithm therefore has three logical recursive calls (I always convert the last one to iterative, for performance). However since it is likely to pick a better median than the median-of-three technique, it can actually perform faster.
Proportion-Extend Sort is NOT stable.

[show source code]

http://epubs.siam.org/SICOMP/volume-31/art_34290.html

Introspective Sort
Unfortunately it's still possible to get crappy performance, just extremely unlikely. One can modify Quick Sort to monitor the recursion depth and if it gets too deep, switch to Heap Sort. Heap Sort is not quite as fast as Quick Sort most of the time, but is much faster than Quick Sort when it comes to the worst case. This change guarantees that it will never take far too long. Quick sort variants also often only sort until the items are approximately in order, and at the end a final pass with insertion sort is done. This ends up being faster than letting Quick sort mess around with the fiddly little unsorted portions. So this algorithm is a blend of 3 others!
Intro Sort is NOT stable because it uses Quick Sort and Heap Sort internally, which are not stable, even though it also uses Insertion Sort which is stable.

[show source code]

http://www.cs.rpi.edu/~musser/gp/introsort.ps

Oddball Techniques

Shear Sort
This algorithm treats the input array as it it were a two-dimensional i * j (grid) of items rather than a one dimensional array of n items. This works alright when n can be factorised into approximately equal factors. The nature of the algorithm also means that it can run well on multiple process or cores at once. However it works terribly when n is prime.
Shear Sort is NOT stable.

[show source code]

http://www.cs.rit.edu/~atk/Java/Sorting/sorting.html

Optimistic Sort
This algorithm is a simple fixed network of n*logn compare-exchange operations, which when repeated at most logn times produces a sorted array. I dreamt this up on 13th April 2007. It can't realistically compete with Quick Sorts average case, but is on par with Bitonic Sort algorithms in terms of Big-Oh notation. However this algorithm can finish in fewer than logn passes, in which case it beats Bitonic Sort. The lower bound is O(nlogn) and the upper bound is O(nlog2n).
Each pass is very simple. Perform a compare and swap of items from the first half with the second half. Do this by starting from each end and proceeding towards the middle. Compare and swap the items from each location as you go, swapping if out or order. Repeat on each half, until the there is only 1 item. This often does not in itself fully sort the items, and usually more passes are needed.
It does guarantee that the smallest and largest items find their way to the correct positions in one pass, however. It also unexpectedly seems to be the case that after each compete pass, there are no strictly decreasing runs of length three or greater.
Care must be taken to deal with the middle item wherever the portion to sort is not an even size. The middle item must be compared to another item so that it too is allowed to move.
Although this algorithm is so far not quite competitive with techniques such as quicksort, I encourage others to explore the idea further as it may have some usefulness when combined with merging, or perhaps a final insertion sort pass. If this algorithm can be made more adaptive it may indeed one day become competitive with quicksort. I certainly believe there is more potential for speed gains here.
This algorithm is NOT stable.

As far as I am aware, this algorithm is my own creation. Therefore I do not yet have any links to other sites with the algorithm.

[show source code]
Squeeze Sort

This algorithm is in some ways similar to Strand Sort, except that the runs are not obtained from consecutive items. The runs are instead obtained by scanning through the array looking for items that can extend the run in either direction. This starts with the first item in the array each time, and builds the longest run it can from there, by examining every item in order. In the process, the unsorted items are moved over to fill up the gap where each item taken for the run was. The run, which is stored in a secondary buffer, is then merged with the sorted portion after each pass, and put back into the original array where space was made for it when items were moved over earlier.
This algorithm is NOT stable.

[show source code]
http://www.codeproject.com/useritems/SQ_Srt_New.asp
http://www.articleboy.com/Article/Squeeze-Sort--A-New-Sorting-Technique/1188

Non-Comparison-Based Sort Techniques

Flash Sort

All of the algorithms so far are comparison-based, and as such cannot do better than O (nlogn) running time in the worse case. To do any better we need to compare keys in a more useful manner than simply determining which of two is bigger. We can take advantage of extra information, such as the difference between the keys to get a much better idea of how far apart they should be. Once we find the lowest and the highest items, we can work out approximately where the other items should go, by the relative distance of their keys. This is the principle behind Flash Sort. As with Mean Sort, this algorithm works best when the distribution of values is roughly linear.
Flash Sort allows a great degree of control with regards to the memory-usage/speed tradeoff. In the following implementation I have pinned the minimum and maximum memory usage as well.
Flash Sort is NOT stable.

[show source code]

http://www.neubert.net/FSOIntro.html
http://www.nist.gov/dads/HTML/flashsort.html

Radix Sort
One can go even further towards extracting more information from the keys. Going beyond subtracting keys as in Flash Sort, you can actually sort by the exact bytes of the key, one at a time, or by some other number of bits of the key at once. This technique is not very general as it can't work for every type of data, and typically requires modifications for each data type you wish to use it on. As it sorts according to the least-significant bits first, the internal algorithm used "Counting Sort" is required to be a stable algorithm. Counting Sort is so data-specific that I don't bother with it at all.
The great news is that unlike Flash Sort, this is not affected by non-linear distributions.
Radix Sort is stable.

[show source code]

http://en.wikipedia.org/wiki/Radix_sort
http://www.nist.gov/dads/HTML/radixsort.html

Unfeasible Sort Techniques

Gnome Sort
Well, now that we've reached the fastest sorting algorithms, let take a look at some of the slowest.
Gnome Sort is extremely simple. Compare an item with its neighbour, if it is out of order, swap them and move back (not past the start), otherwise move forward. Stop when you reach the end. This actually works basically like Insertion Sort, but less efficiently.
This algorithm is O(n2), but is stable.

[show source code]

http://en.wikipedia.org/wiki/Gnome_sort
http://www.nist.gov/dads/HTML/gnomeSort.html

Stupid Sort
Can we do worse than Gnome Sort? Sure, go back to the start after finding items out of order and swapping them. This algorithm is so bad that it is O(n3), but it is stable.

[show source code]

Stooge Sort

Here's a bazaar idea. Can we sort with no loop whatsoever? Sure by using the powers of recursion we can develop Stooge Sort. This algorithm is so bad that it is O(n2.7)
Stooge Sort is NOT stable.

[show source code]

http://en.wikipedia.org/wiki/Stooge_sort
http://www.nist.gov/dads/HTML/stoogesort.html

Fib Sort
Just as Stooge Sort has no loop, neither does Fib Sort. The running time of this algorithm with n items, relates to the nth Fibonacci number, hence the name.
Fib Sort is NOT stable.

[show source code]

http://www.ics.uci.edu/~eppstein/161/exams/mt1.pdf

Slow Sort
Okay, now things just get really insane. This algorithm containing 3 recursive calls is actually designed to be as innocently looking and yet as inefficient as possible. Hey, if you can’t design the fastest algorithm, why not go for the slowest! This algorithm manages to be about as bad as O(n(logn)/2) even though on the surface it appears to contain code similar to Stooge Sort. The authors (Andrei Broder & Jorge Stolfi) don't mention this at all, but this sort is NOT stable.

[show source code]

http://www.cs.uiuc.edu/class/fa05/cs473ug/resources/pessimal-algorithms.pdf
http://de.wikipedia.org/wiki/Slowsort 

Perm Sort
Hmm, can we get any slower than Slow Sort? You bet! We can try every permutation of the items until we find a match. The number of permutations gets ridiculously high very quickly, and this algorithm swaps items like crazy, so you'll probably literally grow old waiting for it, but what the heck. What's more it uses a lot of stack space
This worst-case algorithm is O(n*n!) and it is NOT stable either.

[show source code]

http://cg.scs.carleton.ca/~morin/misc/sortalg/

Bozo Sort
In Bozo Sort, we just swap items randomly until the whole lot just happen to be in sorted order. As you can imagine, the chance of it even succeeding aren't that great, and if it does, it most likely isn't going to be quick.
Bozo Sort is NOT stable.

[show source code]

http://en.wikipedia.org/wiki/Bozo_sort

On To List Sorting >

Download Mega Sorting Demo program (104933 bytes zip file)