A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems
Communications of the ACM
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Cache performance analysis of traversals and random accesses
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Towards a theory of cache-efficient algorithms
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Accessing Multiple Sequences Through Set Associative Caches
ICAL '99 Proceedings of the 26th International Colloquium on Automata, Languages and Programming
Fast Priority Queues for Cached Memory
ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Analysing Cache Effects in Distribution Sorting
WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
High-Performance Algorithm Engineering for Computational Phylogenetics
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
An Out-of-Core Sorting Algorithm for Clusters with Processors at Different Speed
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Future Generation Computer Systems - Systems performance analysis and evaluation
Hi-index | 0.00 |
Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines. A key step toward developing better models is to quantify the performance effects of features not reflected in the models. This paper explores the effect of memory system features on sorting performance. We introduce a new cache-conscious sorting algorithm, R-merge, which achieves better performance in practice over algorithms that are theoretically superior under the models. R-merge is designed to minimize memory stall cycles rather than cache misses, considering features common to many system designs.