ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Cache-optimal methods for bit-reversals
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP
IEEE Transactions on Parallel and Distributed Systems
Implementing Quicksort programs
Communications of the ACM
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands
IEEE Transactions on Parallel and Distributed Systems
High-Performance Algorithm Engineering for Computational Phylogenetics
The Journal of Supercomputing - Special issue on computational issues in fluid dynamics optimization and simulation
The set-associative cache performance of search trees
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
High-Performance Algorithm Engineering for Computational Phylogenetics
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Compiler-directed run-time monitoring of program data access
Proceedings of the 2002 workshop on Memory system performance
Reconstructing optimal phylogenetic trees: a challenge in experimental algorithmics
Experimental algorithmics
Dynamic Load Sharing With Unknown Memory Demands in Clusters
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
A blocked all-pairs shortest-paths algorithm
Journal of Experimental Algorithmics (JEA)
Cache-conscious sorting of large sets of strings with dynamic tries
Journal of Experimental Algorithmics (JEA)
Token-ordered LRU: an effective page replacement policy and its implementation in Linux systems
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Using random sampling to build approximate tries for efficient string sorting
Journal of Experimental Algorithmics (JEA)
Cache-efficient string sorting using copying
Journal of Experimental Algorithmics (JEA)
Engineering a cache-oblivious sorting algorithm
Journal of Experimental Algorithmics (JEA)
Protecting RFID communications in supply chains
ASIACCS '07 Proceedings of the 2nd ACM symposium on Information, computer and communications security
Index compression is good, especially for random access
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
An experimental study of sorting and branch prediction
Journal of Experimental Algorithmics (JEA)
Ensuring Dual Security Modes in RFID-Enabled Supply Chain Systems
ISPEC '09 Proceedings of the 5th International Conference on Information Security Practice and Experience
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Achieving high security and efficiency in RFID-tagged supply chains
International Journal of Applied Cryptography
Algorithm engineering: bridging the gap between algorithm theory and practice
Algorithm engineering: bridging the gap between algorithm theory and practice
ULCC: a user-level facility for optimizing shared cache performance on multicores
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
PMA: Pixel-based multi-anchor algorithm for image recognition on multi-core systems
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Hi-index | 0.00 |
Memory hierarchy considerations during sorting algorithm design and implementation play an important role in significantly improving execution performance. Existing algorithms mainly attempt to reduce capacity misses on direct-mapped caches. To reduce other types of cache misses that occur in the more common set-associative caches and the TLB, we restructure the mergesort and quicksort algorithms further by integrating tiling, padding, and buffering techniques and by repartitioning the data set. Our study shows that substantial performance improvements can be obtained using our new methods.