Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Using lifetime predictors to improve memory allocation performance
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Computer organization & design: the hardware/software interface
Computer organization & design: the hardware/software interface
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The influence of caches on the performance of heaps
Journal of Experimental Algorithmics (JEA)
New faster Kernighan-Lin-type graph-partitioning algorithms
ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Segregating heap objects by reference behavior and lifetime
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Parametric shape analysis via 3-valued logic
Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An automatic object inlining optimization and its evaluation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Automated data-member layout of heap objects to improve memory-hierarchy performance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Properties of the working-set model
Communications of the ACM
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
The hardness of cache conscious data placement
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Machine Learning
Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures
CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
Cache Behavior Prediction by Abstract Interpretation
SAS '96 Proceedings of the Third International Symposium on Static Analysis
Improving Cache Behavior of Dynamically Allocated Data Structures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Exposing Memory Access Regularities Using Object-Relative Memory Profiling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Compositional static instruction cache simulation
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Design space exploration of caches using compressed traces
Proceedings of the 18th annual international conference on Supercomputing
The garbage collection advantage: improving program locality
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Using trace analysis for improving performance in COTS systems
CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Memory Profiling using Hardware Counters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A sample-based cache mapping scheme
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Replicating memory behavior for performance prediction
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Whole execution traces and their applications
ACM Transactions on Architecture and Code Optimization (TACO)
Memory access pattern analysis and stream cache design for multimedia applications
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Profiling over Adaptive Ranges
Proceedings of the International Symposium on Code Generation and Optimization
Practical Structure Layout Optimization and Advice
Proceedings of the International Symposium on Code Generation and Optimization
ALITER: an asynchronous lightweight instrumentation tool for event recording
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
Cache-conscious coallocation of hot data streams
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
DEP: detailed execution profile
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Whole-program optimization of global variable layout
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
HeapMD: identifying heap-based bugs using anomaly detection
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Structure Layout Optimization for Multithreaded Programs
Proceedings of the International Symposium on Code Generation and Optimization
External memory page remapping for embedded multimedia systems
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Data layouts for object-oriented programs
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Formulating and implementing profiling over adaptive ranges
ACM Transactions on Architecture and Code Optimization (TACO)
Hardware-compiler co-design for adjustable data power savings
Microprocessors & Microsystems
A compiler framework for general memory layout optimizations targeting structures
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Composition-based Cache simulation for structure reorganization
Journal of Systems Architecture: the EUROMICRO Journal
Exploiting stability to reduce time-space cost for memory tracing
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Studying microarchitectural structures with object code reordering
Proceedings of the Workshop on Binary Instrumentation and Applications
Evaluating the accuracy of Java profilers
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web 2.0 Applications
ACM Transactions on the Web (TWEB)
Automatic memory partitioning: increasing memory parallelism via data structure partitioning
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A cost-intelligent application-specific data layout scheme for parallel file systems
Proceedings of the 20th international symposium on High performance distributed computing
On-the-fly structure splitting for heap objects
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Data-Layout optimization using reuse distance distribution
EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Using memory profile analysis for automatic synthesis of pointers code
ACM Transactions on Embedded Computing Systems (TECS)
Cachetor: detecting cacheable data to remove bloat
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A scalable and near-optimal representation of access schemes for memory management
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Data-layout optimizations rearrange fields within objects, objects within objects, and objects within the heap, with the goal of increasing spatial locality. While the importance of data-layout optimizations has been growing, their deployment has been limited, partly because they lack a unifying framework. We propose a parameterizable framework for data-layout optimization of general-purpose applications. Acknowledging that finding an optimal layout is not only NP-hard, but also poorly approximable, our framework finds a good layout by searching the space of possible layouts, with the help of profile feedback. The search process iteratively prototypes candidate data layouts, evaluating them by "simulating" the program on a representative trace of memory accesses. To make the search process practical, we develop space-reduction heuristics and optimize the expensive simulation via memoization. Equipped with this iterative approach, we can synthesize layouts that outperform existing non-iterative heuristics, tune application-specific memory allocators, as well as compose multiple data-layout optimizations.