STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
Faster optimal parallel prefix sums and list ranking
Information and Computation
A simple randomized parallel algorithm for list-ranking
Information Processing Letters
Computer organization & design: the hardware/software interface
Computer organization & design: the hardware/software interface
List ranking and list scan on the Cray C-90
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Efficient massively parallel implementation of some combinatorial algorithms
Theoretical Computer Science
Better trade-offs for parallel list ranking
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
From algorithm parallelism to instruction-level parallelism: an encode-decode chain using prefix-sum
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
VLSI Architecture: Past, Present, and Future
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Randomized speed-ups in parallel computation
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Parallel tree contraction and its application
SFCS '85 Proceedings of the 26th Annual Symposium on Foundations of Computer Science
Hi-index | 0.00 |
Algorithms for the problem of list ranking are empirically studied with respect to the Explicit Multi-Threaded (XMT) platform for instruction-level parallelism (ILP). The main goal of this study is to understand the differences between XMT and more traditional parallel computing implementation platforms/models as they pertain to the well studied list ranking problem. The main two findings are: (i) Good speedups for much smaller inputs are possible. (ii) In part, this finding is based on competitive performance by a new variant of a 1984 algorithm, called the No-Cut algorithm. The paper incorporates analytic (non-asymptotic) performance analysis into experimental performance analysis for relatively small inputs. This provides an interesting example where experimental research and theoretical analysis complement one another. 1 Explicit Multi-Threading (XMT) is a fine-grained computation framework introduced in our SPAA'98 paper. Building on some key ideas of parallel computing, XMT covers the spectrum from algorithms through architecture to implementation; the main implementation related innovation in XMT was through the incorporation of low-overhead hardware and software mechanisms (for more effective fine-grained parallelism). The reader is referred to that paper for detail on these mechanisms. The XMT platform aims at faster single-task completion time by way of ILP.