• Any operation may take operands (parameters, arguments). Example: function foo($arg1, $arg2, $arg3)
  • Any operands may have attributes. Example: f(long int $arg1, fancy blue dishwasher $arg2)
  • The reader (compiler or interpreter) takes operation + operands and produces instructions that will have the desired effect. This can be done at runtime or at compile time.

Operations, operands, and attributes might be seen respectively as Verbs, Nouns, and Adjectives.

The question of what instructions are to run depends upon the attributes of the operands. For the simplest example, the code "2 + x" will produce different instructions depending upon the attributes of x.

small int x = 42
return 2 + x // add ax, 2

long int x = 42
return 2 + x // add eax, 2

float x = 42.0
return 2 + x // fadd(42.0) ; adds to st0

At some point the system determines "x is an int, use integer add" or "x is a float, use floating point add". The process of compilation can be described as the resolution of high-level code to lower-level implementation. This may involve translation through intermediary languages.

An interpreted high-level language will make these determinations at run time. A compiler may make the process more efficient by stripping out all of these determinations at compile time. At the most extreme of efficiency the reader may determine that the code will always produce the same result (42+2 = 44) and insert the resulting value rather than the code to produce it.

Optimization vs. reuse

A downside of such optimizations is that it becomes impossible to modify the behavior of a program at runtime if the behavior-determining code is optimized out. Correspondingly, an advantage to keeping code unoptimized is that the program may be modified by a wrapper that changes the resolution of a function call.

A system could provide the best of both worlds by using an object format that stores both the optimized version and a high-level interpretation, and recompiling on the fly when directed to change part of the program. To allow the recompilation of one subroutine at a time, instead of rebuilding the whole program, the metadata may include a catalogue of entry points.

Sorting

Nov. 29th, 2015 08:44 pm

Streamable sorting

A streamable sorting algorithm is one that has an intermediary state in which the entire set may not yet be sorted, but it can be guaranteed that some number of items at the head are sorted. For example, the bubble sort will bring the optimal item to the head of the set before the rest of the set is sorted. The sorted elements at the head may be dispatched while the sorting algorithm continues to run on the remainder of the list.

The use case is to pipe data to a different operation in a multi-stage process so that the entire process will be completed more quickly than if the process had waited for the set to be sorted. This could be in shell scripting or any I/O bound transfer of data. Another use case is in a user-facing interface where the appearance of doing something is more important than total runtime.

Speed considerations

A strong argument against use of a streamable sorting algorithm is that non-streamable algorithms are so much faster than the bubble sort that the process is likely to complete sooner if one chooses the better algorithm and waits for it to complete.

Made-up example:

  1. Algorithm A will sort in 10 seconds and can start transfering after 4 seconds because it knows that the head is optimal.
  2. Algorithm B will sort in 5 seconds, but I/O cannot begin transfer until all items are sorted.

The slower algorithm will get the data across sooner if certain conditions are true:

  1. Transfer time is slower than the sort time of the faster algorithm.
  2. The transfer process will not interfere with the sorting process. On modern multi-cpu systems this should not be a problem.
  3. The fraction of time when transfer may begin (say, at 40% of sort time) is lower than the speed of the comparable nonstreamable algorithm compared to the nonstreamable one (say, algorithm B finishes sorting in 50% of the time of the streamable algorithm). Note that this will depend heavily on the set size due to the performance differences in sorting algorithms.

Variables:

  • Tx = (Time to transfer X objects)
  • At = (start of transfer for algorithm A)
  • Bt = (start of transfer for algorithm B)

When transfer time is slower than the sorting time of either sort algorithm, the overall process is bound by when the transfer begins. The streamable algorithm will be faster when it begins transfer before the non-streamable algorithm would finish and begin transfer.

When transfer time is faster than the slower streamable sorting algorithm, the streamable algorithm is bound by its own slowness. The process will be faster than a non-streamable algorithm only if transfer time remains slow enough for (Tx + Bt) to be greater than the time needed for the streamable algorithm to finish sorting.


Dividing sort time across sender and receiver

Imagine the following:

  • We need to sort and transfer data between two systems that both need to minimize CPU use.
  • We have a sorting algorithm that divides the set into chunks and sorts the chunks over multiple passes.
  • We can predict the number of passes needed by the algorithm from the set size.

Assuming that it is still the 1990s when these limits would matter (this is more of an intellectual exercise than a serious proposal), we can divide the effort of sorting between client and server. The sender runs half of the expected number of passes and sends the data partially-sorted. The receiver finishes sorting the half-sorted data as the data is received and inserts the items coming down the pipe where they best fit.

I finally got my Javascript implementation of Craig Reynolds's boids algorithm to work. Here is the relevant code:

var weights= new Array(1.0, 1.0, 1.0);

sepvect = sepvect.multiply(weights[0]);
alivect = alivect.multiply(weights[1]);
cohvect = cohvect.multiply(weights[2]);

I just needed to record magnitudes for one run and fiddle with the weights. (0.5,1.0,0.2) seemed to do the trick, though I should test with different numbers of boids to see if the magnitudes are relative to that variable.

As regards "finally", I started on this so long ago that I forget when and have intermittently picked it up and re-abandoned it since then. The oldest timestamp that I can find for it is 2006, but I think it goes back to 2003 or 2004 when it was going to be something that I would put together during spring break. The biggest problem was a trig error that I fixed last month after having almost fixed it earlier, causing directions to be wrong in one or two of the four quadrants.

Page generated Aug. 20th, 2017 07:20 pm
Powered by Dreamwidth Studios