Effective “static-graph” reorganization to improve locality in garbage-collected systems
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Studies of Windows NT performance using dynamic execution traces
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
ACM Computing Surveys (CSUR)
Data page layouts for relational databases on deep memory hierarchies
The VLDB Journal — The International Journal on Very Large Data Bases
The set-associative cache performance of search trees
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Scalable Large Scale Process Modeling and Simulations in Liquid Composite Molding
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Weaving Relations for Cache Performance
Proceedings of the 27th International Conference on Very Large Data Bases
Speculative Prefetching of Induction Pointers
CC '01 Proceedings of the 10th International Conference on Compiler Construction
The Journal of Supercomputing
Highly accurate and efficient evaluation of randomising set index functions
Journal of Systems Architecture: the EUROMICRO Journal
Accelerating database operators using a network processor
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Improving the energy behavior of block buffering using compiler optimizations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Improving instruction cache performance in OLTP
ACM Transactions on Database Systems (TODS)
Tlink-tree: main memory index structure with concurrency control and recovery
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Science of Computer Programming
Traversal caches: a first step towards FPGA acceleration of pointer-based data structures
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Scalable parallel word search in multicore/multiprocessor systems
The Journal of Supercomputing
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Memory-access-aware data structure transformations for embedded software with dynamic data accesses
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
Cache index-aware memory allocation
Proceedings of the international symposium on Memory management
Reducing Network-on-Chip energy consumption through spatial locality speculation
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Performance analysis of the cache conscious-generalized search tree
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part III
GPU computing of compressible flow problems by a meshless method with space-filling curves
Journal of Computational Physics
Hi-index | 4.10 |
Rapid increases in processor speed and slower increases in memory speed have produced memory access times that exceed the cost of simple, arithmetic operations. The ubiquitous hardware solution to this problem is memory caches, which exploit program locality to reduce the average latency. Other techniques use complex hardware and software to reduce or hide the high cost of memory accesses.The processor-memory gap requires a hierarchy of two or more caches between the processor and memory. The cost of finding data in this hierarchy undercuts the fundamental RAM model assumption that all memory accesses have unit cost.To narrow the widening gap between processor and memory performance, the authors propose using pointer structures to bolster performance by placing elements in a compound data structure in different memory and cache locations. This careful placement of structure elements enhances the performance of pointer-minipulating programs by improving their cache locality.