Nov. 29th, 2015

Sorting

Nov. 29th, 2015 08:44 pm

Streamable sorting

A streamable sorting algorithm is one that has an intermediary state in which the entire set may not yet be sorted, but it can be guaranteed that some number of items at the head are sorted. For example, the bubble sort will bring the optimal item to the head of the set before the rest of the set is sorted. The sorted elements at the head may be dispatched while the sorting algorithm continues to run on the remainder of the list.

The use case is to pipe data to a different operation in a multi-stage process so that the entire process will be completed more quickly than if the process had waited for the set to be sorted. This could be in shell scripting or any I/O bound transfer of data. Another use case is in a user-facing interface where the appearance of doing something is more important than total runtime.

Speed considerations

A strong argument against use of a streamable sorting algorithm is that non-streamable algorithms are so much faster than the bubble sort that the process is likely to complete sooner if one chooses the better algorithm and waits for it to complete.

Made-up example:

  1. Algorithm A will sort in 10 seconds and can start transfering after 4 seconds because it knows that the head is optimal.
  2. Algorithm B will sort in 5 seconds, but I/O cannot begin transfer until all items are sorted.

The slower algorithm will get the data across sooner if certain conditions are true:

  1. Transfer time is slower than the sort time of the faster algorithm.
  2. The transfer process will not interfere with the sorting process. On modern multi-cpu systems this should not be a problem.
  3. The fraction of time when transfer may begin (say, at 40% of sort time) is lower than the speed of the comparable nonstreamable algorithm compared to the nonstreamable one (say, algorithm B finishes sorting in 50% of the time of the streamable algorithm). Note that this will depend heavily on the set size due to the performance differences in sorting algorithms.

Variables:

  • Tx = (Time to transfer X objects)
  • At = (start of transfer for algorithm A)
  • Bt = (start of transfer for algorithm B)

When transfer time is slower than the sorting time of either sort algorithm, the overall process is bound by when the transfer begins. The streamable algorithm will be faster when it begins transfer before the non-streamable algorithm would finish and begin transfer.

When transfer time is faster than the slower streamable sorting algorithm, the streamable algorithm is bound by its own slowness. The process will be faster than a non-streamable algorithm only if transfer time remains slow enough for (Tx + Bt) to be greater than the time needed for the streamable algorithm to finish sorting.


Dividing sort time across sender and receiver

Imagine the following:

  • We need to sort and transfer data between two systems that both need to minimize CPU use.
  • We have a sorting algorithm that divides the set into chunks and sorts the chunks over multiple passes.
  • We can predict the number of passes needed by the algorithm from the set size.

Assuming that it is still the 1990s when these limits would matter (this is more of an intellectual exercise than a serious proposal), we can divide the effort of sorting between client and server. The sender runs half of the expected number of passes and sends the data partially-sorted. The receiver finishes sorting the half-sorted data as the data is received and inserts the items coming down the pipe where they best fit.

Let us say that we have a data object and we want to manage metadata about the object. There are multiple ways of handling this. Read more... )

Page generated Jul. 6th, 2025 09:52 am
Powered by Dreamwidth Studios