Inexpensive implementations of set-associativity
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Analysis of vector access performance on skewed interleaved memory
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Procedure merging with instruction caches
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
On reconfigurable on-chip data caches
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Ordering functions for improving memory reference locality in a shared memory multiprocessor system
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The effect of page allocation on caches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Improving the cache locality of memory allocation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Using lifetime predictors to improve memory allocation performance
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Balanced scheduling: instruction scheduling when memory latency is uncertain
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Chinese remainder theorem and the prime memory system
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Tradeoffs in two-level on-chip caching
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler support for software-based cache partitioning
LCTES '95 Proceedings of the ACM SIGPLAN 1995 workshop on Languages, compilers, & tools for real-time systems
Hardware and software mechanisms for reducing load latency
Hardware and software mechanisms for reducing load latency
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Decoupled access/execute computer architectures
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automated data-member layout of heap objects to improve memory-hierarchy performance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving fine-grained irregular shared-memory benchmarks by data reordering
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
ICS '01 Proceedings of the 15th international conference on Supercomputing
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
The hardness of cache conscious data placement
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Software caching vs. prefetching
Proceedings of the 3rd international symposium on Memory management
Design space optimization of embedded memory systems via data remapping
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Creating and preserving locality of java applications at allocation and garbage collection times
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Data page layouts for relational databases on deep memory hierarchies
The VLDB Journal — The International Journal on Very Large Data Bases
Data remapping for design space optimization of embedded memory systems
ACM Transactions on Embedded Computing Systems (TECS)
The set-associative cache performance of search trees
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Weaving Relations for Cache Performance
Proceedings of the 27th International Conference on Very Large Data Bases
Data Compression Transformations for Dynamically Allocated Data Structures
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Optimization opportunities created by global data reordering
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Continuous program optimization: A case study
ACM Transactions on Programming Languages and Systems (TOPLAS)
Exposing Memory Access Regularities Using Object-Relative Memory Profiling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Prefetch injection based on hardware monitoring and object metadata
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
EMBARC: an efficient memory bank assignment algorithm for retargetable compilers
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Journal of Experimental Algorithmics (JEA)
Identifying and Exploiting Spatial Regularity in Data Memory References
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
NUMA-Aware Java Heaps for Server Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Owl: next generation system monitoring
Proceedings of the 2nd conference on Computing frontiers
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Memory-side prefetching for linked data structures for processor-in-memory systems
Journal of Parallel and Distributed Computing
A hierarchical model of data locality
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Recursive data structure profiling
Proceedings of the 2005 workshop on Memory system performance
Practical Structure Layout Optimization and Advice
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic memory optimization using pool allocation and prefetching
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Memory-manager/scheduler co-design: optimizing event-driven servers to improve cache behavior
Proceedings of the 5th international symposium on Memory management
Cache-conscious coallocation of hot data streams
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
The hardness of cache conscious data placement
Nordic Journal of Computing
Whole-program optimization of global variable layout
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Reliability-aware data placement for partial memory protection in embedded processors
Proceedings of the 2006 workshop on Memory system performance and correctness
Power-efficient prefetching for embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Page mapping for heterogeneously partitioned caches: Complexity and heuristics
Journal of Embedded Computing - Cache exploitation in embedded systems
Structure Layout Optimization for Multithreaded Programs
Proceedings of the International Symposium on Code Generation and Optimization
Data layouts for object-oriented programs
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Removing the memory limitations of sensor networks with flash-based virtual memory
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Data morphing: an adaptive, cache-conscious storage technique
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Heterogeneously tagged caches for low-power embedded systems with virtual memory support
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Traversal caches: a first step towards FPGA acceleration of pointer-based data structures
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Exploiting selective placement for low-cost memory protection
ACM Transactions on Architecture and Code Optimization (TACO)
Revisiting Cache Block Superloading
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Placement optimization using data context collected during garbage collection
Proceedings of the 2009 international symposium on Memory management
A component model of spatial locality
Proceedings of the 2009 international symposium on Memory management
Two memory allocators that use hints to improve locality
Proceedings of the 2009 international symposium on Memory management
Hardware-compiler co-design for adjustable data power savings
Microprocessors & Microsystems
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Tree-traversal orientation analysis
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Custom memory allocation for free
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
On improving heap memory layout by dynamic pool allocation
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Compiler techniques for reducing data cache miss rate on a multithreaded architecture
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A graph theoretic approach to cache-conscious placement of data for direct mapped caches
Proceedings of the 2010 international symposium on Memory management
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Fast and compact hash tables for integer keys
ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Data layout for cache performance on a multithreaded architecture
Transactions on high-performance embedded architectures and compilers III
Cache index-aware memory allocation
Proceedings of the international symposium on Memory management
Reducing Network-on-Chip energy consumption through spatial locality speculation
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
On-the-fly structure splitting for heap objects
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Performance analysis of the cache conscious-generalized search tree
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part III
Improving shared cache behavior of multithreaded object-oriented applications in multicores
Proceedings of the International Conference on Computer-Aided Design
Optimizing data locality using array tiling
Proceedings of the International Conference on Computer-Aided Design
Reducing energy and increasing performance with traffic optimization in many-core systems
Proceedings of the System Level Interconnect Prediction Workshop
MiniTasking: improving cache performance for multiple query workloads
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A comparative analysis of performance improvement schemes for cache memories
Computers and Electrical Engineering
Trace-Based data layout optimizations for multi-core processors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
MAC: migration-aware compilation for STT-RAM based hybrid cache in embedded systems
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Hi-index | 0.00 |
As the gap between memory and processor speeds continues to widen, cache eficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction cache pet$ormance by mapping code with temporal locality to different cache blocks in the virtual address space eliminating cache conflicts. These code placement techniques can be applied directly to the problem of placing data for improved data cache pedormance.In this paper we present a general framework for Cache Conscious Data Placement. This is a compiler directed approach that creates an address placement for the stack (local variables), global variables, heap objects, and constants in order to reduce data cache misses. The placement of data objects is guided by a temporal relationship graph between objects generated via profiling. Our results show that profile driven data placement significantly reduces the data miss rate by 24% on average.