Performance engineering case study: heap construction

Authors:
Jesper Bojesen;Jyrki Katajainen;Maz Spork
Affiliations:
Univ. of Copenhagen, Denmark;Univ. of Copenhagen, Denmark;Univ. of Copenhagen, Denmark
Venue:
Journal of Experimental Algorithmics (JEA)
Year:
2000

Citing 24
Cited 4

Amortized efficiency of list update and paging rules

Communications of the ACM
Building heaps fast

Journal of Algorithms
The C programming language

The C programming language
Algorithms from P to NP (vol. 1): design and efficiency

Algorithms from P to NP (vol. 1): design and efficiency
Average case analysis of heap building by repeated insertion

Journal of Algorithms
A note on HEAPSORT

The Computer Journal - Special issue on models and architectures
The worst case complexity of McDiarmid and Reed's variant of BOTTOM-UP HEAPSORT is less than n log n + 1.1n

Information and Computation
The discoveries of continuations

Lisp and Symbolic Computation - Special issue on continuations—part I
The influence of caches on the performance of heaps

Journal of Experimental Algorithmics (JEA)
Analysis of Hoare's FIND algorithm with median-of-three partition

Random Structures & Algorithms - Special issue: average-case analysis of algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Heaps and heapsort on secondary storage

Theoretical Computer Science
The influence of caches on the performance of sorting

Journal of Algorithms
Implementing Quicksort programs

Communications of the ACM
Expected time bounds for selection

Communications of the ACM
Algorithm 245: Treesort

Communications of the ACM
Algorithm 65: find

Communications of the ACM
The C++ Programming Language, Third Edition

The C++ Programming Language, Third Edition
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes

IEEE Transactions on Computers
Sorting and Searching on the Word RAM

STACS '98 Proceedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science
A Meticulous Analysis of Mergesort Programs

CIAC '97 Proceedings of the Third Italian Conference on Algorithms and Complexity
External selection

STACS'99 Proceedings of the 16th annual conference on Theoretical aspects of computer science

Navigation piles with applications to sorting, priority queues, and priority deques

Nordic Journal of Computing
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
In-place heap construction with optimized comparisons, moves, and cache misses

MFCS'12 Proceedings of the 37th international conference on Mathematical Foundations of Computer Science
Weak heaps engineered

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

The behaviour of three methods for constructing a binary heap on a computer with a hierarchical memory is studied. The methods considered are the original one proposed by Williams [1964], in which elements are repeatedly inserted into a single heap; the improvement by Floyd [1964], in which small heaps are repeatedly merged to bigger heaps; and a recent method proposed, e.g., by Fadel et al. [1999] in which a heap is built layerwise. Both the worst-case number of instructions and that of cache misses are analysed. It is well-known that Floyd's method has the best instruction count. Let N denote the size of the heap to be constructed, B the number of elements that fit into a cache line, and let c and d be some positive constants. Our analysis shows that, under reasonable assumptions, repeated insertion and layerwise construction both incur at most cN/B cache misses, whereas repeated merging, as programmed by Floyd, can incur more than (dN log2 B)/B cache misses. However, for our memory-tuned versions of repeated insertion and repeated merging the number of cache misses incurred is close to the optimal bound N/B. In addition to these theoretical findings, we communicate many practical experiences which we hope to be valuable for others doing experimental algorithmic work.