An empirical comparison of priority-queue and event-set implementations
Communications of the ACM
Calendar queues: a fast 0(1) priority queue implementation for the simulation event set problem
Communications of the ACM
IEEE Transactions on Software Engineering
Introduction to algorithms
MFCS '90 Selected papers of the 15th international symposium on Mathematical foundations of computer science
Fishspear: a priority queue algorithm
Journal of the ACM (JACM)
The influence of caches on the performance of heaps
Journal of Experimental Algorithmics (JEA)
Simple randomized mergesort on parallel disks
Parallel Computing - Special double issue: parallel I/O
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Worst-Case External-Memory Priority Queues
SWAT '98 Proceedings of the 6th Scandinavian Workshop on Algorithm Theory
The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract)
WADS '95 Proceedings of the 4th International Workshop on Algorithms and Data Structures
First draft of a report on the EDVAC
First draft of a report on the EDVAC
High-Performance Algorithm Engineering for Computational Phylogenetics
The Journal of Supercomputing - Special issue on computational issues in fluid dynamics optimization and simulation
The set-associative cache performance of search trees
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
High-Performance Algorithm Engineering for Computational Phylogenetics
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Optimizing Graph Algorithms for Improved Cache Performance
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Experimental Evaluation of a New Shortest Path Algorithm
ALENEX '02 Revised Papers from the 4th International Workshop on Algorithm Engineering and Experiments
Asynchronous parallel disk sorting
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Presenting data from experiments in algorithmics
Experimental algorithmics
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
Engineering a cache-oblivious sorting algorithm
Journal of Experimental Algorithmics (JEA)
Terracost: Computing least-cost-path surfaces for massive grid terrains
Journal of Experimental Algorithmics (JEA)
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Locality behavior of parallel and sequential algorithms for irregular graph problems
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
A study on the locality behavior of minimum spanning tree algorithms
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
STXXL: standard template library for XXL data sets
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Combining the sweep-line method with the use of an external-memory priority queue
SPIN'12 Proceedings of the 19th international conference on Model Checking Software
MCSTL: the multi-core standard template library
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Optimizing Landmark-Based Routing and Preprocessing
Proceedings of the Sixth ACM SIGSPATIAL International Workshop on Computational Transportation Science
Hi-index | 0.02 |
The cache hierarchy prevalent in todays high performance processors has to be taken into account in order to design algorithms that perform well in practice. This paper advocates the adaption of external memory algorithms to this purpose. This idea and the practical issues involved are exemplified by engineering a fast priority queue suited to external memory and cached memory that is based on k-way merging. It improves previous external memory algorithms by constant factors crucial for transferring it to cached memory. Running in the cache hierarchy of a workstation the algorithm is at least two times faster than an optimized implementation of binary heaps and 4-ary heaps for large inputs.