Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Segregating heap objects by reference behavior and lifetime
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Overlapping execution with transfer using non-strict execution for mobile programs
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving Cache Behavior of Dynamically Allocated Data Structures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
BIT: a tool for instrumenting java bytecodes
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
ASTLOG: a language for examining abstract syntax trees
DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An automatic object inlining optimization and its evaluation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Automated data-member layout of heap objects to improve memory-hierarchy performance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving fine-grained irregular shared-memory benchmarks by data reordering
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
The hardness of cache conscious data placement
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Design space optimization of embedded memory systems via data remapping
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Object combining: A new aggressive optimization for object intensive programs
JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Data remapping for design space optimization of embedded memory systems
ACM Transactions on Embedded Computing Systems (TECS)
Influence of Array Allocation Mechanisms on Memory System Energy
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Programmable Memory Hierarchy for Prefetching Linked Data Structures
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
A Graph-Free Approach to Data-Flow Analysis
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Automatic pool allocation for disjoint data structures
Proceedings of the 2002 workshop on Memory system performance
The cache behaviour of large lazy functional programs on stock hardware
Proceedings of the 2002 workshop on Memory system performance
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Continuous program optimization: A case study
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
The garbage collection advantage: improving program locality
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Identifying opportunities for automatic remote field cloning
CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Tolerating memory latency through push prefetching for pointer-intensive applications
ACM Transactions on Architecture and Code Optimization (TACO)
Memory Profiling using Hardware Counters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
NUMA-Aware Java Heaps for Server Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Refactoring gcc using structure field access traces and concept analysis
WODA '05 Proceedings of the third international workshop on Dynamic analysis
Hardware acceleration for database systems using content addressable memories
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Practical Structure Layout Optimization and Advice
Proceedings of the International Symposium on Code Generation and Optimization
Intermediately executed code is the key to find refactorings that improve temporal data locality
Proceedings of the 3rd conference on Computing frontiers
Restructuring field layouts for embedded memory systems
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Memory-manager/scheduler co-design: optimizing event-driven servers to improve cache behavior
Proceedings of the 5th international symposium on Memory management
Cache-conscious coallocation of hot data streams
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Profile-guided proactive garbage collection for locality optimization
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
The hardness of cache conscious data placement
Nordic Journal of Computing
Dynamic allocation for scratch-pad memory using compile-time decisions
ACM Transactions on Embedded Computing Systems (TECS)
Whole-program optimization of global variable layout
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Type-sensitive control-flow analysis
Proceedings of the 2006 workshop on ML
Cache-Friendly implementations of transitive closure
Journal of Experimental Algorithmics (JEA)
Issues in holistic system design
Proceedings of the 3rd workshop on Programming languages and operating systems: linguistic support for modern operating systems
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
Page mapping for heterogeneously partitioned caches: Complexity and heuristics
Journal of Embedded Computing - Cache exploitation in embedded systems
Offline compression for on-chip ram
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Structure Layout Optimization for Multithreaded Programs
Proceedings of the International Symposium on Code Generation and Optimization
Forma: A framework for safe automatic array reshaping
ACM Transactions on Programming Languages and Systems (TOPLAS)
Empirical study of optimization techniques for massive slicing
ACM Transactions on Programming Languages and Systems (TOPLAS)
MPADS: memory-pooling-assisted data splitting
Proceedings of the 7th international symposium on Memory management
Set-Congruence Dynamic Analysis for Thread-Level Speculation (TLS)
Languages and Compilers for Parallel Computing
Abstracting access patterns of dynamic memory using regular expressions
ACM Transactions on Architecture and Code Optimization (TACO)
Revisiting Cache Block Superloading
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Runtime engine for dynamic profile guided stride prefetching
Journal of Computer Science and Technology
Eliminating the call stack to save RAM
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A compile/run-time environment for the automatic transformation of linked list data structures
International Journal of Parallel Programming
Enhancing source-level programming tools with an awareness of transparent program transformations
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Robust signatures for kernel data structures
Proceedings of the 16th ACM conference on Computer and communications security
Scalable parallel word search in multicore/multiprocessor systems
The Journal of Supercomputing
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Composition-based Cache simulation for structure reorganization
Journal of Systems Architecture: the EUROMICRO Journal
Tree-traversal orientation analysis
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Custom memory allocation for free
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Layout transformations for heap objects using static access patterns
CC'07 Proceedings of the 16th international conference on Compiler construction
On improving heap memory layout by dynamic pool allocation
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
CST-trees: cache sensitive t-trees
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Cache conscious trees: how do they perform on contemporary commodity microprocessors?
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I
A graph theoretic approach to cache-conscious placement of data for direct mapped caches
Proceedings of the 2010 international symposium on Memory management
Automatic feedback-directed object fusing
ACM Transactions on Architecture and Code Optimization (TACO)
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Implementing statically typed object-oriented programming languages
ACM Computing Surveys (CSUR)
Patterns for cache optimizations on multi-processor machines
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Structuring the unstructured middle with chunk computing
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Reducing Network-on-Chip energy consumption through spatial locality speculation
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Enhancing locality for recursive traversals of recursive structures
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
On-the-fly structure splitting for heap objects
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A framework for compiler driven design space exploration for embedded system customization
ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Data slicing: separating the heap into independent regions
CC'05 Proceedings of the 14th international conference on Compiler Construction
Continuous object access profiling and optimizations to overcome the memory wall and bloat
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
A comparative analysis of performance improvement schemes for cache memories
Computers and Electrical Engineering
Trace-Based data layout optimizations for multi-core processors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
CloudIQ: a framework for processing base stations in a data center
Proceedings of the 18th annual international conference on Mobile computing and networking
Heap slicing using type systems
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
Automatically enhancing locality for tree traversals with traversal splicing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
ShadowData: shadowing heap objects in Java
Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering
Parallelization of multimedia applications on the multi-level computing architecture
Journal of Embedded Computing
Hi-index | 0.00 |
A program's cache performance can be improved by changing the organization and layout of its data---even complex, pointer-based data structures. Previous techniques improved the cache performance of these structures by arranging distinct instances to increase reference locality. These techniques produced significant performance improvements, but worked best for small structures that could be packed into a cache block.This paper extends that work by concentrating on the internal organization of fields in a data structure. It describes two techniques---structure splitting and field reordering---that improve the cache behavior of structures larger than a cache block. For structures comparable in size to a cache block, structure splitting can increase the number of hot fields that can be placed in a cache block. In five Java programs, structure splitting reduced cache miss rates 10--27% and improved performance 6--18% beyond the benefits of previously described cache-conscious reorganization techniques.For large structures, which span many cache blocks, reordering fields, to place those with high temporal affinity in the same cache block can also improve cache utilization. This paper describes bbcache, a tool that recommends C structure field reorderings. Preliminary measurements indicate that reordering fields in 5 active structures improves the performance of Microsoft SQL Server 7.0 2--3%.