Modeling parallel bandwidth: local vs. global restrictions
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
On sorting strings in external memory (extended abstract)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Communication-processor tradeoffs in limited resources PRAM
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Compression using efficient multicasting
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
Handbook of massive data sets
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Hi-index | 0.00 |
The introduction of parallel models that account for communication between processors has shown that interprocessor bandwidth is often the limiting factor in parallel computing. In this paper, we introduce a new coding technique for transmitting the XOR of carefully selected patterns of bits to be communicated which greatly reduces bandwidth requirements in some settings. This technique has broader applications. For example, we demonstrate that the coding technique has a surprising application to a simple I/O (Input/Output) complexity problem related to finding the transpose of a matrix. Our main results are developed in the PRAM(M) model, a limited bandwidth PRAM model where P processors communicate through a small globally shared memory of M bits. We provide new algorithms for the problems of sorting and permutation routing. For the concurrent read PRAM(M), as P grows with M held constant, our sorting algorithm outperforms any previous algorithm by O(logc P) for any constant c. The combination of a known lower bound for sorting in the exclusive read PRAM(M) model and this algorithm implies that the concurrent read PRAM(M) is strictly more powerful than the exclusive read PRAM(M).