Amortized efficiency of list update and paging rules
Communications of the ACM
STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Optimal and sublogarithmic time randomized parallel sorting algorithms
SIAM Journal on Computing
A bridging model for parallel computation
Communications of the ACM
An introduction to parallel algorithms
An introduction to parallel algorithms
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Programming parallel algorithms
Communications of the ACM
Cilk: an efficient multithreaded runtime system
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Provably efficient scheduling for languages with fine-grained parallelism
Journal of the ACM (JACM)
External-memory graph algorithms
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Samplesort: A Sampling Approach to Minimal Storage Tree Sorting
Journal of the ACM (JACM)
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
Cache-oblivious priority queue and graph algorithm applications
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Dag-Consistent Distributed Shared Memory
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Cache Oblivious Distribution Sweeping
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Proximity Mergesort: optimal in-place sorting in the cache-oblivious model
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The cache complexity of multithreaded cache oblivious algorithms
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Engineering a cache-oblivious sorting algorithm
Journal of Experimental Algorithmics (JEA)
Models for parallel and hierarchical computation
Proceedings of the 4th international conference on Computing frontiers
Optimal sparse matrix dense vector multiplication in the I/O-model
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
A consistency architecture for hierarchical shared caches
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Fundamental parallel algorithms for private-cache chip multiprocessors
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Cache-efficient dynamic programming algorithms for multicores
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
A unified model for multicore architectures
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Brief announcement: low depth cache-oblivious sorting
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Cache-aware and cache-oblivious adaptive sorting
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Parallel approximation algorithms for facility-location problems
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Brief announcement: paging for multicore processors
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Scheduling irregular parallel computations on hierarchical caches
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Balance principles for algorithm-architecture co-design
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Paging for multi-core shared caches
Proceedings of the 3rd Innovations in Theoretical Computer Science Conference
Internally deterministic parallel algorithms can be fast
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Parallel and I/O efficient set covering algorithms
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Can traditional programming bridge the Ninja performance gap for parallel computing applications?
Proceedings of the 39th Annual International Symposium on Computer Architecture
Computational geometry in the parallel external memory model
SIGSPATIAL Special
On the bit-complexity of sparse polynomial and series multiplication
Journal of Symbolic Computation
Program-centric cost models for locality
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Parallel triangle counting in massive streaming graphs
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches. The approach is to design nested-parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cache-oblivious model. We describe several cache-oblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply on matrices with good vertex separators. Using known mappings, our results lead to low cache complexities on shared-memory multiprocessors with a single level of private caches or a single shared cache. We generalize these mappings to multi-level cache hierarchies of private or shared caches, implying that our algorithms also have low cache complexities on such hierarchies. The key factor in obtaining these low parallel cache complexities is the low depth of the algorithms we propose.