Improving memory performance of sorting algorithms

Authors:
Li Xiao;Xiaodong Zhang;Stefan A. Kubricht
Affiliations:
College of William and Mary;College of William and Mary;College of William and Mary
Venue:
Journal of Experimental Algorithmics (JEA)
Year:
2000

Citing 10
Cited 22

ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
The influence of caches on the performance of sorting

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Cache-optimal methods for bit-reversals

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP

IEEE Transactions on Parallel and Distributed Systems
Implementing Quicksort programs

Communications of the ACM
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands

IEEE Transactions on Parallel and Distributed Systems
High-Performance Algorithm Engineering for Computational Phylogenetics

The Journal of Supercomputing - Special issue on computational issues in fluid dynamics optimization and simulation
The set-associative cache performance of search trees

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
High-Performance Algorithm Engineering for Computational Phylogenetics

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Compiler-directed run-time monitoring of program data access

Proceedings of the 2002 workshop on Memory system performance
Reconstructing optimal phylogenetic trees: a challenge in experimental algorithmics

Experimental algorithmics
Dynamic Load Sharing With Unknown Memory Demands in Clusters

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
A blocked all-pairs shortest-paths algorithm

Journal of Experimental Algorithmics (JEA)
Cache-conscious sorting of large sets of strings with dynamic tries

Journal of Experimental Algorithmics (JEA)
Token-ordered LRU: an effective page replacement policy and its implementation in Linux systems

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Using random sampling to build approximate tries for efficient string sorting

Journal of Experimental Algorithmics (JEA)
Cache-efficient string sorting using copying

Journal of Experimental Algorithmics (JEA)
Engineering a cache-oblivious sorting algorithm

Journal of Experimental Algorithmics (JEA)
Protecting RFID communications in supply chains

ASIACCS '07 Proceedings of the 2nd ACM symposium on Information, computer and communications security
Index compression is good, especially for random access

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
An experimental study of sorting and branch prediction

Journal of Experimental Algorithmics (JEA)
Ensuring Dual Security Modes in RFID-Enabled Supply Chain Systems

ISPEC '09 Proceedings of the 5th International Conference on Information Security Practice and Experience
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Achieving high security and efficiency in RFID-tagged supply chains

International Journal of Applied Cryptography
Algorithm engineering: bridging the gap between algorithm theory and practice

Algorithm engineering: bridging the gap between algorithm theory and practice
ULCC: a user-level facility for optimizing shared cache performance on multicores

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
PMA: Pixel-based multi-anchor algorithm for image recognition on multi-core systems

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory hierarchy considerations during sorting algorithm design and implementation play an important role in significantly improving execution performance. Existing algorithms mainly attempt to reduce capacity misses on direct-mapped caches. To reduce other types of cache misses that occur in the more common set-associative caches and the TLB, we restructure the mergesort and quicksort algorithms further by integrating tiling, padding, and buffering techniques and by repartitioning the data set. Our study shows that substantial performance improvements can be obtained using our new methods.