Self-adjusting binary search trees
Journal of the ACM (JACM)
An empirical comparison of priority-queue and event-set implementations
Communications of the ACM
SIAM Journal on Computing
A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Algorithms
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
ACM Transactions on Computer Systems (TOCS)
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
Introduction to algorithms
An optimal algorithm for deleting the roof of a heap
Information Processing Letters
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficient trace-driven simulation methods for cache performance analysis
ACM Transactions on Computer Systems (TOCS)
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Performance of priority queue structures in a virtual memory environment
The Computer Journal - Special issue on data structures
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches
IEEE Transactions on Computers
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Improving the cache locality of memory allocation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Design tradeoffs for software-managed TLBs
ACM Transactions on Computer Systems (TOCS)
A study of single-chip processor/cache organizations for large numbers of transistors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
IBM Journal of Research and Development
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Data structures and algorithm analysis (2nd ed.)
Data structures and algorithm analysis (2nd ed.)
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The AlphaServer 8000 series: high-end server platform development
Digital Technical Journal - Special 10th anniversary issue
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Performance Analysis of Cache Memories
Journal of the ACM (JACM)
Cache Performance in the VAX-11/780
ACM Transactions on Computer Systems (TOCS)
Operating Systems Theory
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Caches and algorithms
Lock bypassing: an efficient algorithm for concurrently accessing priority heaps
Journal of Experimental Algorithmics (JEA)
Cache conscious programming in undergraduate computer science
SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Cache performance analysis of traversals and random accesses
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Fast priority queues for cached memory
Journal of Experimental Algorithmics (JEA)
Performance engineering case study: heap construction
Journal of Experimental Algorithmics (JEA)
An experimental study of priority queues in external memory
Journal of Experimental Algorithmics (JEA)
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Cache-oblivious priority queue and graph algorithm applications
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Cache oblivious search trees via binary trees of small height
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-Performance Algorithm Engineering for Computational Phylogenetics
The Journal of Supercomputing - Special issue on computational issues in fluid dynamics optimization and simulation
The set-associative cache performance of search trees
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
High-Performance Algorithm Engineering for Computational Phylogenetics
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Optimizing Graph Algorithms for Improved Cache Performance
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Cache Conscious Indexing for Decision-Support in Main Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A Blocked All-Pairs Shortest-Path Algorithm
SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
Accessing Multiple Sequences Through Set Associative Caches
ICAL '99 Proceedings of the 26th International Colloquium on Automata, Languages and Programming
Fast Priority Queues for Cached Memory
ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Performance Engineering Case Study: Heap Construction
WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
An Experimental Study of Priority Queues in External Memory
WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Selecting Problems for Algorithm Evaluation
WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
A heap-based algorithm for the study of one-dimensional particle systems
Journal of Computational Physics
A comparison of cache aware and cache oblivious static search trees using program instrumentation
Experimental algorithmics
Reconstructing optimal phylogenetic trees: a challenge in experimental algorithmics
Experimental algorithmics
Navigation piles with applications to sorting, priority queues, and priority deques
Nordic Journal of Computing
A blocked all-pairs shortest-paths algorithm
Journal of Experimental Algorithmics (JEA)
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
Identifying opportunities for automatic remote field cloning
CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
The d-deap: a fast and simple cache-aligned d-ary deap
Information Processing Letters
Cache oblivious algorithms for nonserial polyadic programming
The Journal of Supercomputing
Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An experimental study of sorting and branch prediction
Journal of Experimental Algorithmics (JEA)
Terracost: Computing least-cost-path surfaces for massive grid terrains
Journal of Experimental Algorithmics (JEA)
The d-deap*: a fast and simple cache-aligned d-ary deap
Information Processing Letters
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Revisiting priority queues for image analysis
Pattern Recognition
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Shared hardware data structures for hard real-time systems
Proceedings of the tenth ACM international conference on Embedded software
Polynomial division using dynamic arrays, heaps, and packed exponent vectors
CASC'07 Proceedings of the 10th international conference on Computer Algebra in Scientific Computing
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.01 |
As memory access times grow larger relative to processor cycletimes, the cache performance of algorithms has an increasinglylarge impact on overall performance. Unfortunately, most commonlyused algorithms were not designed with cache performance in mind.This paper investigates the cache performance of implicit heaps. Wepresent optimizations which significantly reduce the cache missesthat heaps incur and improve their overall performance. We presentan analytical model called collective analysis that allows cacheperformance to be predicted as a function of both cacheconfiguration and algorithm configuration. As part of ourinvestigation, we perform an approximate analysis of the cacheperformance of both traditional heaps and our improved heaps in ourmodel. In addition empirical data is given for five architecturesto show the impact our optimizations have on overall performance.We also revisit a priority queue study originally performed byJones [25]. Due to the increases in cache miss penalties, therelative performance results we obtain on today's machines differgreatly from the machines of only ten years ago. We compare theperformance of implicit heaps, skew heaps and splay trees anddiscuss the difference between our results and Jones's.