The influence of caches on the performance of heaps

Authors:
Anthony LaMarca;Richard Ladner
Affiliations:
-;-
Venue:
Journal of Experimental Algorithmics (JEA)
Year:
1996

Citing 37
Cited 46

Self-adjusting binary search trees

Journal of the ACM (JACM)
An empirical comparison of priority-queue and event-set implementations

Communications of the ACM
Heaps on heaps

SIAM Journal on Computing
A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Algorithms

Algorithms
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
An analytical cache model

ACM Transactions on Computer Systems (TOCS)
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
A tool to aid in the design, implementation, and understanding of matrix algorithms for parallel processors

Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
Introduction to algorithms

Introduction to algorithms
An optimal algorithm for deleting the roof of a heap

Information Processing Letters
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficient trace-driven simulation methods for cache performance analysis

ACM Transactions on Computer Systems (TOCS)
MemSpy: analyzing memory system bottlenecks in programs

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Performance of priority queue structures in a virtual memory environment

The Computer Journal - Special issue on data structures
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches

IEEE Transactions on Computers
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Improving the cache locality of memory allocation

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Expected heights in heaps

BIT
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Cache interference phenomena

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Design tradeoffs for software-managed TLBs

ACM Transactions on Computer Systems (TOCS)
A study of single-chip processor/cache organizations for large numbers of transistors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

IBM Journal of Research and Development
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Data structures and algorithm analysis (2nd ed.)

Data structures and algorithm analysis (2nd ed.)
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The AlphaServer 8000 series: high-end server platform development

Digital Technical Journal - Special 10th anniversary issue
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
The influence of caches on the performance of sorting

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Performance Analysis of Cache Memories

Journal of the ACM (JACM)
Cache Performance in the VAX-11/780

ACM Transactions on Computer Systems (TOCS)
Operating Systems Theory

Operating Systems Theory
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Cache Profiling and the SPEC Benchmarks: A Case Study

Computer
Caches and algorithms

Caches and algorithms

Lock bypassing: an efficient algorithm for concurrently accessing priority heaps

Journal of Experimental Algorithmics (JEA)
Cache conscious programming in undergraduate computer science

SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
The influence of caches on the performance of sorting

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Cache performance analysis of traversals and random accesses

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Fast priority queues for cached memory

Journal of Experimental Algorithmics (JEA)
Performance engineering case study: heap construction

Journal of Experimental Algorithmics (JEA)
An experimental study of priority queues in external memory

Journal of Experimental Algorithmics (JEA)
An efficient profile-analysis framework for data-layout optimizations

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Cache-oblivious priority queue and graph algorithm applications

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Cache oblivious search trees via binary trees of small height

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-Performance Algorithm Engineering for Computational Phylogenetics

The Journal of Supercomputing - Special issue on computational issues in fluid dynamics optimization and simulation
The set-associative cache performance of search trees

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
High-Performance Algorithm Engineering for Computational Phylogenetics

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Optimizing Graph Algorithms for Improved Cache Performance

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Cache Conscious Indexing for Decision-Support in Main Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A Blocked All-Pairs Shortest-Path Algorithm

SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
Accessing Multiple Sequences Through Set Associative Caches

ICAL '99 Proceedings of the 26th International Colloquium on Automata, Languages and Programming
Fast Priority Queues for Cached Memory

ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Performance Engineering Case Study: Heap Construction

WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
An Experimental Study of Priority Queues in External Memory

WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture

WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Selecting Problems for Algorithm Evaluation

WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework

IEEE Transactions on Parallel and Distributed Systems
A heap-based algorithm for the study of one-dimensional particle systems

Journal of Computational Physics
A comparison of cache aware and cache oblivious static search trees using program instrumentation

Experimental algorithmics
Reconstructing optimal phylogenetic trees: a challenge in experimental algorithmics

Experimental algorithmics
Navigation piles with applications to sorting, priority queues, and priority deques

Nordic Journal of Computing
A blocked all-pairs shortest-paths algorithm

Journal of Experimental Algorithmics (JEA)
Optimizing Graph Algorithms for Improved Cache Performance

IEEE Transactions on Parallel and Distributed Systems
Identifying opportunities for automatic remote field cloning

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
The d-deap: a fast and simple cache-aligned d-ary deap

Information Processing Letters
Cache oblivious algorithms for nonserial polyadic programming

The Journal of Supercomputing
Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An experimental study of sorting and branch prediction

Journal of Experimental Algorithmics (JEA)
Terracost: Computing least-cost-path surfaces for massive grid terrains

Journal of Experimental Algorithmics (JEA)
The d-deap*: a fast and simple cache-aligned d-ary deap

Information Processing Letters
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Revisiting priority queues for image analysis

Pattern Recognition
Redesigning the string hash table, burst trie, and BST to exploit cache

Journal of Experimental Algorithmics (JEA)
On experimental algorithmics: an interview with Catherine McGeoch and Bernard Moret

Ubiquity
Shared hardware data structures for hard real-time systems

Proceedings of the tenth ACM international conference on Embedded software
Polynomial division using dynamic arrays, heaps, and packed exponent vectors

CASC'07 Proceedings of the 10th international conference on Computer Algebra in Scientific Computing
On experimental algorithmics: an interview with Catherine McGeoch and Bernard Moret

Ubiquity
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.01

Visualization

Abstract

As memory access times grow larger relative to processor cycletimes, the cache performance of algorithms has an increasinglylarge impact on overall performance. Unfortunately, most commonlyused algorithms were not designed with cache performance in mind.This paper investigates the cache performance of implicit heaps. Wepresent optimizations which significantly reduce the cache missesthat heaps incur and improve their overall performance. We presentan analytical model called collective analysis that allows cacheperformance to be predicted as a function of both cacheconfiguration and algorithm configuration. As part of ourinvestigation, we perform an approximate analysis of the cacheperformance of both traditional heaps and our improved heaps in ourmodel. In addition empirical data is given for five architecturesto show the impact our optimizations have on overall performance.We also revisit a priority queue study originally performed byJones [25]. Due to the increases in cache miss penalties, therelative performance results we obtain on today's machines differgreatly from the machines of only ten years ago. We compare theperformance of implicit heaps, skew heaps and splay trees anddiscuss the difference between our results and Jones's.