Practical Structure Layout Optimization and Advice

Authors:
Robert Hundt;Sandya Mannarswamy;Dhruva Chakrabarti
Affiliations:
Hewlett-Packard Company;Hewlett-Packard Company;Hewlett-Packard Company
Venue:
Proceedings of the International Symposium on Code Generation and Optimization
Year:
2006

Citing 20
Cited 9

Improving the cache locality of memory allocation

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Cache performance of garbage-collected programs

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Static branch frequency and program profile analysis

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Nesting of reducible and irreducible loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Using generational garbage collection to implement cache-conscious data placement

Proceedings of the 1st international symposium on Memory management
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Automated data-member layout of heap objects to improve memory-hierarchy performance

ACM Transactions on Programming Languages and Systems (TOPLAS)
An efficient profile-analysis framework for data-layout optimizations

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
HP Caliper: A Framework for Performance Analysis Tools

IEEE Concurrency
Data remapping for design space optimization of embedded memory systems

ACM Transactions on Embedded Computing Systems (TECS)
Graph Layout through the VCG Tool

GD '94 Proceedings of the DIMACS International Workshop on Graph Drawing
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Improving Cache Behavior of Dynamically Allocated Data Structures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
SYZYGY - A Framework for Scalable Cross-Module IPO

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A data locality optimizing algorithm

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
The garbage collection advantage: improving program locality

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Automatic pool allocation: improving performance by controlling data structure layout in the heap

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Whole-program optimization of global variable layout

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Structure Layout Optimization for Multithreaded Programs

Proceedings of the International Symposium on Code Generation and Optimization
Abstracting access patterns of dynamic memory using regular expressions

ACM Transactions on Architecture and Code Optimization (TACO)
Compiler aided selective lock assignment for improving the performance of software transactional memory

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A compiler framework for general memory layout optimizations targeting structures

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Composition-based Cache simulation for structure reorganization

Journal of Systems Architecture: the EUROMICRO Journal
Improving MPI communication via data type fission

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
On-the-fly structure splitting for heap objects

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Trace-Based data layout optimizations for multi-core processors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the delta between processor clock frequency and memory latency ever increasing and with the standard locality improving transformations maturing, compilers increasingly seek to modify an application's data layout to improve spatial and temporal locality and to reduce cache miss and page fault penalties. In this paper we describe a practical implementation of the data layout optimizations Structure Splitting, Structure Peeling, Structure Field Reordering and Dead Field Removal, both for profile and non-profile based compilations. We demonstrate significant performance gains, but find that automatic transformations fail for a relatively high number of record types because of legality violations or profitability constraints. Additionally, we find a class of desirable transformations for which the framework cannot provide satisfying results. To address this issue we complement the automatic transformations with an advisory tool. We reuse the compiler analysis done for automatic transformation and correlate its results with peformance data collected during runtime for structure fields, such as data cache misses and latencies. We then use the compiler as a pefomtance analysis and reporting tool and provide insight into how to layout structure types more eficiently.