A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems
Communications of the ACM
Can parallel algorithms enhance serial implementation?
Communications of the ACM
Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems
SIAM Journal on Computing
Optimal read-once parallel disk scheduling
Proceedings of the sixth workshop on I/O in parallel and distributed systems
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Towards a theory of cache-efficient algorithms
Journal of the ACM (JACM)
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Getting More from Out-of-Core Columnsort
ALENEX '02 Revised Papers from the 4th International Workshop on Algorithm Engineering and Experiments
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Asynchronous parallel disk sorting
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
On the limits of cache-obliviousness
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Adaptive disk scheduling in a multimedia DBMS
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Data Cache Prefetching Using a Global History Buffer
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
I/O-efficient undirected shortest paths with unbounded edge lengths
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Hierarchical memory with block transfer
SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
Hi-index | 0.00 |
External Memory models, most notable being the I-O Model [3], capture the effects of memory hierarchy and aid in algorithm design. More than a decade of architectural advancements have led to new features not captured in the I-O model – most notably the prefetching capability. We propose a relatively simple Prefetch model that incorporates data prefetching in the traditional I-O models and show how to design algorithms that can attain close to peak memory bandwidth. Unlike (the inverse of) memory latency, the memory bandwidth is much closer to the processing speed, thereby, intelligent use of prefetching can considerably mitigate the I-O bottleneck. For some fundamental problems, our algorithms attain running times approaching that of the idealized Random Access Machines under reasonable assumptions. Our work also explains the significantly superior performance of the I-O efficient algorithms in systems that support prefetching compared to ones that do not.