Trace-Based data layout optimizations for multi-core processors

Authors:
Olga Golovanevsky;Alon Dayan;Ayal Zaks;David Edelsohn
Affiliations:
IBM Haifa Research Laboratory;IBM Haifa Research Laboratory;IBM Haifa Research Laboratory;IBM Watson Research Center
Venue:
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Year:
2010

Citing 20
Cited 0

Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
Static branch frequency and program profile analysis

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Efficient representations and abstractions for quantifying and exploiting data reference locality

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Data remapping for design space optimization of embedded memory systems

ACM Transactions on Embedded Computing Systems (TECS)
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework

IEEE Transactions on Parallel and Distributed Systems
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Lightweight reference affinity analysis

Proceedings of the 19th annual international conference on Supercomputing
Practical Structure Layout Optimization and Advice

Proceedings of the International Symposium on Code Generation and Optimization
Multi-compilation: capturing interactions among concurrently-executing applications

Proceedings of the 3rd conference on Computing frontiers
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
Locality approximation using time

Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM Transactions on Programming Languages and Systems (TOPLAS)
Structure Layout Optimization for Multithreaded Programs

Proceedings of the International Symposium on Code Generation and Optimization
Forma: A framework for safe automatic array reshaping

ACM Transactions on Programming Languages and Systems (TOPLAS)
MPADS: memory-pooling-assisted data splitting

Proceedings of the 7th international symposium on Memory management
Compiler techniques for reducing data cache miss rate on a multithreaded architecture

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The focus of this paper is on cache-conscious data layout optimizations. Although these optimizations have already been adopted by industrial compilers, they were shown to be inefficient for multi-process applications on multi-core platforms. Such factors as asymmetric distribution of processes over hardware resources (cores, cpus or hardware threads), along with their temporal migrations, unpredictably influence optimization results. Herein we present a new methodology that extends classical data layout optimizations to support multi-core architectures. Based on data trace collection that reflects actual interleaving of data accesses, this method aims to improve spatial locality of the data, while mitigating potential false sharing events. Introduction of architectural characteristics into an analysis phase further increases the accuracy of data affinity estimation. Feasibility study of this method, applied to multi-process webserver lighttpd on Power5 machine, not only showed performance improvement, but also proved its suitability for incorporation into an industrial compiler.