Static grouping of small objects to enhance performance of a paged virtual memory
ACM Transactions on Computer Systems (TOCS)
Improving locality of reference in a garbage-collecting memory management system
Communications of the ACM
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Clustering a DAG for CAD Databases
IEEE Transactions on Software Engineering
ACM Transactions on Computer Systems (TOCS)
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The performance and utility of the Cactis implementation algorithms
Proceedings of the sixteenth international conference on Very large databases
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Effective “static-graph” reorganization to improve locality in garbage-collected systems
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
On the performance of object clustering techniques
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches
IEEE Transactions on Computers
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The RADIANCE lighting simulation and rendering system
SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The influence of caches on the performance of heaps
Journal of Experimental Algorithmics (JEA)
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Studies of Windows NT performance using dynamic execution traces
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Segregating heap objects by reference behavior and lifetime
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Principles of Optimal Page Replacement
Journal of the ACM (JACM)
Performance Analysis of Cache Memories
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
A simple linear model of demand paging performance
Communications of the ACM
Program Behavior: Models and Measurements
Program Behavior: Models and Measurements
IEEE Micro
Object Type Directed Garbage Collection To Improve Locality
IWMM '92 Proceedings of the International Workshop on Memory Management
VIS: A System for Verification and Synthesis
CAV '96 Proceedings of the 8th International Conference on Computer Aided Verification
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Garbage collection in a large LISP system
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Cache hit ratios with geometric task switch intervals
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Improving Cache Behavior of Dynamically Allocated Data Structures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Reducing transfer delay using Java class file splitting and prefetching
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
An automatic object inlining optimization and its evaluation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Improving fine-grained irregular shared-memory benchmarks by data reordering
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
ICS '01 Proceedings of the 15th international conference on Supercomputing
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
The hardness of cache conscious data placement
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Exploiting prolific types for memory management and optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Software caching vs. prefetching
Proceedings of the 3rd international symposium on Memory management
Design space optimization of embedded memory systems via data remapping
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Computation regrouping: restructuring programs for temporal data cache locality
ICS '02 Proceedings of the 16th international conference on Supercomputing
Fractal prefetching B+-Trees: optimizing both cache and disk performance
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Reconsidering custom memory allocation
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Creating and preserving locality of java applications at allocation and garbage collection times
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Immutability specification and its applications
JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Data remapping for design space optimization of embedded memory systems
ACM Transactions on Embedded Computing Systems (TECS)
The set-associative cache performance of search trees
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Optimizing Graph Algorithms for Improved Cache Performance
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A Programmable Memory Hierarchy for Prefetching Linked Data Structures
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
A Graph-Free Approach to Data-Flow Analysis
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Data Compression Transformations for Dynamically Allocated Data Structures
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Array Unification: A Locality Optimization Technique
CC '01 Proceedings of the 10th International Conference on Compiler Construction
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Continuous program optimization: A case study
ACM Transactions on Programming Languages and Systems (TOPLAS)
Effect of node size on the performance of cache-conscious B+-trees
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Profile-guided I/O partitioning
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications
IEEE Transactions on Computers
A Framework to Capture Dynamic Data Structures in Pointer-Based Codes
IEEE Transactions on Parallel and Distributed Systems
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Application of Symbolic Approach to the Bernstein Expansion for Program Analysis and Optimization
Programming and Computing Software
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Prefetch injection based on hardware monitoring and object metadata
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Restructuring computations for temporal data cache locality
International Journal of Parallel Programming
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
The garbage collection advantage: improving program locality
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Finding your cronies: static analysis for dynamic object colocation
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Identifying opportunities for automatic remote field cloning
CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Journal of Experimental Algorithmics (JEA)
Tolerating memory latency through push prefetching for pointer-intensive applications
ACM Transactions on Architecture and Code Optimization (TACO)
A Constraint Network Based Approach to Memory Layout Optimization
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Data Centric Cache Measurement on the Intel ltanium 2 Processor
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Memory Profiling using Hardware Counters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Identifying and Exploiting Spatial Regularity in Data Memory References
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
NUMA-Aware Java Heaps for Server Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Optimizing Checkpoint Sizes in the C3 System
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Journal of VLSI Signal Processing Systems
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Memory-side prefetching for linked data structures for processor-in-memory systems
Journal of Parallel and Distributed Computing
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A hierarchical model of data locality
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A locality-improving dynamic memory allocator
Proceedings of the 2005 workshop on Memory system performance
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Practical Structure Layout Optimization and Advice
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic memory optimization using pool allocation and prefetching
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Memory-manager/scheduler co-design: optimizing event-driven servers to improve cache behavior
Proceedings of the 5th international symposium on Memory management
DieHard: probabilistic memory safety for unsafe languages
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Cache-conscious coallocation of hot data streams
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
The hardness of cache conscious data placement
Nordic Journal of Computing
Data trace cache: an application specific cache architecture
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Whole-program optimization of global variable layout
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Instruction scheduling for a tiled dataflow architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Cache-oblivious nested-loop joins
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Cache-Friendly implementations of transitive closure
Journal of Experimental Algorithmics (JEA)
Issues in holistic system design
Proceedings of the 3rd workshop on Programming languages and operating systems: linguistic support for modern operating systems
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
Framework for instruction-level tracing and analysis of program executions
Proceedings of the 2nd international conference on Virtual execution environments
Cache oblivious algorithms for nonserial polyadic programming
The Journal of Supercomputing
Page mapping for heterogeneously partitioned caches: Complexity and heuristics
Journal of Embedded Computing - Cache exploitation in embedded systems
Structure Layout Optimization for Multithreaded Programs
Proceedings of the International Symposium on Code Generation and Optimization
Compiler-managed partitioned data caches for low power
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
HAT-trie: a cache-conscious trie-based data structure for strings
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
Data morphing: an adaptive, cache-conscious storage technique
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Heterogeneously tagged caches for low-power embedded systems with virtual memory support
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A general framework for improving query processing performance on multi-level memory hierarchies
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Cache-oblivious databases: Limitations and opportunities
ACM Transactions on Database Systems (TODS)
MPADS: memory-pooling-assisted data splitting
Proceedings of the 7th international symposium on Memory management
LeakSurvivor: towards safely tolerating memory leaks for garbage-collected languages
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Supporting Huge Address Spaces in a Virtual Machine for Java on a Cluster
Languages and Compilers for Parallel Computing
Two memory allocators that use hints to improve locality
Proceedings of the 2009 international symposium on Memory management
Perflint: A Context Sensitive Performance Advisor for C++ Programs
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory management thread for heap allocation intensive sequential applications
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
DDT: design and evaluation of a dynamic program analysis for optimizing data structure usage
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Cache line reservation: exploring a scheme for cache-friendly object allocation
CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Composition-based Cache simulation for structure reorganization
Journal of Systems Architecture: the EUROMICRO Journal
Tree-traversal orientation analysis
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Custom memory allocation for free
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Customized placement for high performance embedded processor caches
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
On improving heap memory layout by dynamic pool allocation
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
The locality of concurrent write barriers
Proceedings of the 2010 international symposium on Memory management
A graph theoretic approach to cache-conscious placement of data for direct mapped caches
Proceedings of the 2010 international symposium on Memory management
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Optimal resource management for a model driven LTE protocol stack on a multicore platform
Proceedings of the 8th ACM international workshop on Mobility management and wireless access
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Improving locality of nonserial polyadic dynamic programming
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
ULCC: a user-level facility for optimizing shared cache performance on multicores
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Patterns for cache optimizations on multi-processor machines
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Structuring the unstructured middle with chunk computing
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Enhancing locality for recursive traversals of recursive structures
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
On-the-fly structure splitting for heap objects
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Improving shared cache behavior of multithreaded object-oriented applications in multicores
Proceedings of the International Conference on Computer-Aided Design
Cache-Conscious collision resolution in string hash tables
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Identifying the sources of cache misses in Java programs without relying on hardware counters
Proceedings of the 2012 international symposium on Memory Management
Automatically enhancing locality for tree traversals with traversal splicing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
OOPSLA 2002: Reconsidering custom memory allocation
ACM SIGPLAN Notices - Supplemental issue
Linearizing irregular memory accesses for improved correlated prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating programs.This paper explores a complementary approach that attacks the source (poor reference locality) of the problem rather than its manifestation (memory latency). It demonstrates that careful data organization and layout provides an essential mechanism to improve the cache locality of pointer-manipulating programs and consequently, their performance. It explores two placement techniques---clustering and coloring---that improve cache performance by increasing a pointer structure's spatial and temporal locality, and by reducing cache-conflicts.To reduce the cost of applying these techniques, this paper discusses two strategies---cache-conscious reorganization and cache-conscious allocation---and describes two semi-automatic tools---ccmorph and ccmalloc---that use these strategies to produce cache-conscious pointer structure layouts. ccmorph is a transparent tree reorganizer that utilizes topology information to cluster and color the structure. ccmalloc is a cache-conscious heap allocator that attempts to co-locate contemporaneously accessed data elements in the same physical cache block. Our evaluations, with microbenchmarks, several small benchmarks, and a couple of large real-world applications, demonstrate that the cache-conscious structure layouts produced by ccmorph and ccmalloc offer large performance benefits---in most cases, significantly outperforming state-of-the-art prefetching.