Cache-conscious data placement

Authors:
Brad Calder;Chandra Krintz;Simmi John;Todd Austin
Affiliations:
Dept. of Computer Science and Engineering, University of California, San Diego;Dept. of Computer Science and Engineering, University of California, San Diego;Dept. of Computer Science and Engineering, University of California, San Diego;Microcomputer Research Labs, Intel Corporation
Venue:
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Year:
1998

Citing 29
Cited 87

Inexpensive implementations of set-associativity

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Analysis of vector access performance on skewed interleaved memory

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Procedure merging with instruction caches

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
On reconfigurable on-chip data caches

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Ordering functions for improving memory reference locality in a shared memory multiprocessor system

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The effect of page allocation on caches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Improving the cache locality of memory allocation

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Using lifetime predictors to improve memory allocation performance

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Balanced scheduling: instruction scheduling when memory latency is uncertain

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Chinese remainder theorem and the prime memory system

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Tradeoffs in two-level on-chip caching

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Complexity/performance tradeoffs with non-blocking loads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing false sharing on shared memory multiprocessors through compile time data transformations

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler support for software-based cache partitioning

LCTES '95 Proceedings of the ACM SIGPLAN 1995 workshop on Languages, compilers, & tools for real-time systems
Hardware and software mechanisms for reducing load latency

Hardware and software mechanisms for reducing load latency
Efficient procedure mapping using cache line coloring

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Profiling and the SPEC Benchmarks: A Case Study

Computer
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture

Using generational garbage collection to implement cache-conscious data placement

Proceedings of the 1st international symposium on Memory management
Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Effective jump-pointer prefetching for linked data structures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automated data-member layout of heap objects to improve memory-hierarchy performance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving fine-grained irregular shared-memory benchmarks by data reordering

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Evaluating the impact of memory system performance on software prefetching and locality optimizations

ICS '01 Proceedings of the 15th international conference on Supercomputing
A framework for reducing the cost of instrumented code

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient representations and abstractions for quantifying and exploiting data reference locality

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
The hardness of cache conscious data placement

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient profile-analysis framework for data-layout optimizations

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Software caching vs. prefetching

Proceedings of the 3rd international symposium on Memory management
Design space optimization of embedded memory systems via data remapping

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Creating and preserving locality of java applications at allocation and garbage collection times

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Making Pointer-Based Data Structures Cache Conscious

Computer
Data page layouts for relational databases on deep memory hierarchies

The VLDB Journal — The International Journal on Very Large Data Bases
Data remapping for design space optimization of embedded memory systems

ACM Transactions on Embedded Computing Systems (TECS)
The set-associative cache performance of search trees

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Inter-array Data Regrouping

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
Data Compression Transformations for Dynamically Allocated Data Structures

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Optimization opportunities created by global data reordering

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Continuous program optimization: A case study

ACM Transactions on Programming Languages and Systems (TOPLAS)
Exposing Memory Access Regularities Using Object-Relative Memory Profiling

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Prefetch injection based on hardware monitoring and object metadata

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
EMBARC: an efficient memory bank assignment algorithm for retargetable compilers

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A performance study of data layout techniques for improving data locality in refinement-based pathfinding

Journal of Experimental Algorithmics (JEA)
Identifying and Exploiting Spatial Regularity in Data Memory References

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
NUMA-Aware Java Heaps for Server Applications

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Owl: next generation system monitoring

Proceedings of the 2nd conference on Computing frontiers
Automatic pool allocation: improving performance by controlling data structure layout in the heap

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Memory-side prefetching for linked data structures for processor-in-memory systems

Journal of Parallel and Distributed Computing
A hierarchical model of data locality

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Recursive data structure profiling

Proceedings of the 2005 workshop on Memory system performance
Practical Structure Layout Optimization and Advice

Proceedings of the International Symposium on Code Generation and Optimization
Dynamic memory optimization using pool allocation and prefetching

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Memory-manager/scheduler co-design: optimizing event-driven servers to improve cache behavior

Proceedings of the 5th international symposium on Memory management
Cache-conscious coallocation of hot data streams

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
The hardness of cache conscious data placement

Nordic Journal of Computing
Whole-program optimization of global variable layout

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Reliability-aware data placement for partial memory protection in embedded processors

Proceedings of the 2006 workshop on Memory system performance and correctness
Power-efficient prefetching for embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
Page mapping for heterogeneously partitioned caches: Complexity and heuristics

Journal of Embedded Computing - Cache exploitation in embedded systems
Structure Layout Optimization for Multithreaded Programs

Proceedings of the International Symposium on Code Generation and Optimization
Data layouts for object-oriented programs

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Removing the memory limitations of sensor networks with flash-based virtual memory

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Data morphing: an adaptive, cache-conscious storage technique

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Data prefetching and address pre-calculation through instruction pre-execution with two-step physical register deallocation

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Heterogeneously tagged caches for low-power embedded systems with virtual memory support

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Traversal caches: a first step towards FPGA acceleration of pointer-based data structures

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Exploiting selective placement for low-cost memory protection

ACM Transactions on Architecture and Code Optimization (TACO)
Revisiting Cache Block Superloading

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Placement optimization using data context collected during garbage collection

Proceedings of the 2009 international symposium on Memory management
A component model of spatial locality

Proceedings of the 2009 international symposium on Memory management
Two memory allocators that use hints to improve locality

Proceedings of the 2009 international symposium on Memory management
Hardware-compiler co-design for adjustable data power savings

Microprocessors & Microsystems
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Tree-traversal orientation analysis

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Custom memory allocation for free

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
On improving heap memory layout by dynamic pool allocation

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Compiler techniques for reducing data cache miss rate on a multithreaded architecture

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A graph theoretic approach to cache-conscious placement of data for direct mapped caches

Proceedings of the 2010 international symposium on Memory management
Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Fast and compact hash tables for integer keys

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Engineering scalable, cache and space efficient tries for strings

The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache

Journal of Experimental Algorithmics (JEA)
Data layout for cache performance on a multithreaded architecture

Transactions on high-performance embedded architectures and compilers III
Cache index-aware memory allocation

Proceedings of the international symposium on Memory management
Reducing Network-on-Chip energy consumption through spatial locality speculation

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
On-the-fly structure splitting for heap objects

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Performance analysis of the cache conscious-generalized search tree

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part III
Improving shared cache behavior of multithreaded object-oriented applications in multicores

Proceedings of the International Conference on Computer-Aided Design
Optimizing data locality using array tiling

Proceedings of the International Conference on Computer-Aided Design
Reducing energy and increasing performance with traffic optimization in many-core systems

Proceedings of the System Level Interconnect Prediction Workshop
MiniTasking: improving cache performance for multiple query workloads

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A comparative analysis of performance improvement schemes for cache memories

Computers and Electrical Engineering
Trace-Based data layout optimizations for multi-core processors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
MAC: migration-aware compilation for STT-RAM based hybrid cache in embedded systems

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the gap between memory and processor speeds continues to widen, cache eficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction cache pet$ormance by mapping code with temporal locality to different cache blocks in the virtual address space eliminating cache conflicts. These code placement techniques can be applied directly to the problem of placing data for improved data cache pedormance.In this paper we present a general framework for Cache Conscious Data Placement. This is a compiler directed approach that creates an address placement for the stack (local variables), global variables, heap objects, and constants in order to reduce data cache misses. The placement of data objects is guided by a temporal relationship graph between objects generated via profiling. Our results show that profile driven data placement significantly reduces the data miss rate by 24% on average.