Sorting in c log n parallel steps
Combinatorica
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Circuit complexity: from the worst case to the average case
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Efficient VLSI architectures for Columnsort
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Average Case Complexity of the Parallel Prefix Problem
ICALP '94 Proceedings of the 21st International Colloquium on Automata, Languages and Programming
STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
On the Bit-Level Complexity of Bitonic Sorting Networks
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 03
The VLSI Complexity of Sorting
IEEE Transactions on Computers
A (fairly) simple circuit that (usually) sorts
SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science
Hi-index | 5.23 |
In previous work we have introduced an average case measure for the time complexity of Boolean circuits. Instead of fixed circuit depth, for each input we take the minimal number of time steps necessary to perform the computation for that particular input using gates that forward their output values as soon as possible. This measure is called delay. Based on it, the complexity of a whole class of functions that can be described as prefix computations has been analysed in detail. Here we consider the problem to sort large integers that are given in binary notation. Contrary to a word comparator sorting circuitC where a basic computational element, a comparator, is charged with a single time step to compare two elements, in a bit comparator circuitC^' a comparison of two binary numbers has to be implemented by a Boolean subcircuit CM called comparator module that is built from Boolean gates of bounded fanin. Thus, compared to C, the depth of C^' will be larger by a factor up to the depth of CM. Our goal is to minimize the average delay of bit comparator sorting circuits. The worst-case delay can be estimated by the depth of the circuit. For this worst-case measure two topologically quite different designs seem to be appropriate for the comparator modules: a tree-like one if the inputs are long numbers, otherwise a linear array working in a pipelined fashion. Inserting these into a word comparator circuit we get bit level sorting circuits for binary numbers of length m, for which the depth is either increased by a multiplicative factor of order logm or by an additive term of order m. We show that these obvious solutions can be improved significantly by constructing efficient sorting and merging circuits for the bit model that only suffer a constant factor time loss on the average if the inputs are uniformly distributed. This is done by designing suitable hybrid architectures of tree compaction and pipelining. These results can also be extended to classes of nonuniform distributions if we put a bound on the complexity of the distributions themselves.