Structure Layout Optimization for Multithreaded Programs

Authors:
Easwaran Raman;Robert Hundt;Sandya Mannarswamy
Affiliations:
Princeton University;Hewlett-Packard Company;Hewlett-Packard Company
Venue:
Proceedings of the International Symposium on Code Generation and Optimization
Year:
2007

Citing 17
Cited 4

A Survey of Cache Coherence Schemes for Multiprocessors

Computer
The detection and elimination of useless misses in multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Reducing false sharing on shared memory multiprocessors through compile time data transformations

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Automated data-member layout of heap objects to improve memory-hierarchy performance

ACM Transactions on Programming Languages and Systems (TOPLAS)
An efficient profile-analysis framework for data-layout optimizations

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
HP Caliper: A Framework for Performance Analysis Tools

IEEE Concurrency
False Sharing and Spatial Locality in Multiprocessor Caches

IEEE Transactions on Computers
Cautions, Machine-Independent Performance Tuning for Shared-Memory Multiprocessors

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Improving Cache Behavior of Dynamically Allocated Data Structures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
SYZYGY - A Framework for Scalable Cross-Module IPO

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Practical Structure Layout Optimization and Advice

Proceedings of the International Symposium on Code Generation and Optimization
Cache-conscious coallocation of hot data streams

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Whole-program optimization of global variable layout

Proceedings of the 15th international conference on Parallel architectures and compilation techniques

Polymorphing Software by Randomizing Data Structure Layout

DIMVA '09 Proceedings of the 6th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment
A compiler framework for general memory layout optimizations targeting structures

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Tackling cache-line stealing effects using run-time adaptation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Trace-Based data layout optimizations for multi-core processors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Structure layout optimizations seek to improve runtime performance by improving data locality and reuse. The structure layout heuristics for single-threaded benchmarks differ from those for multi-threaded applications running on multiprocessor machines, where the effects of false sharing need to be taken into account. In this paper we propose a technique for structure layout transformations for multithreaded applications that optimizes both for improved spatial locality and reduced false sharing, simultaneously. We develop a semi-automatic tool that produces actual structure layouts for multi-threaded programs and outputs the key factors contributing to the layout decisions. We apply this tool on the HP-UX kernel and demonstrate the effects of these transformations for a variety of already highly hand-tuned key structures with different set of properties. We show that na篓ýve heuristics can result in massive performance degradations on such a highly tuned application, while our technique generally avoids those pitfalls. The improved structures produced by our tool improve performance by up to 3.2% over a highly tuned baseline.