Compiler-assisted data distribution for chip multiprocessors

Authors:
Yong Li;Ahmed Abousamra;Rami Melhem;Alex K. Jones
Affiliations:
University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA
Venue:
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Year:
2010

Citing 20
Cited 5

Direct parallelization of call statements

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Efficient interprocedural analysis for program parallelization and restructuring

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Splash 2

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
PARADIGM: a compiler for automatic data distribution on multicomputers

ICS '93 Proceedings of the 7th international conference on Supercomputing
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Gated SSA-based demand-driven symbolic analysis for parallelizing compilers

ICS '95 Proceedings of the 9th international conference on Supercomputing
Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Simics: A Full System Simulation Platform

Computer
Compile-Time Techniques for Data Distribution in Distributed Memory Machines

IEEE Transactions on Parallel and Distributed Systems
Exact versus Approximate Array Region Analyses

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Communication-Minimal Partitioning of Parallel Loops and Data Arrays for Cache-Coherent Distributed-Memory Multiprocessors

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Compiler-directed Data Partitioning for Multicluster Processors

Proceedings of the International Symposium on Code Generation and Optimization
Compilers: Principles, Techniques, and Tools (2nd Edition)

Compilers: Principles, Techniques, and Tools (2nd Edition)
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
SOS: A Software-Oriented Distributed Shared Cache Management Approach for Chip Multiprocessors

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Structure-driven optimizations for amorphous data-parallel programs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
A software approach for combating asymmetries of non-volatile memories

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Practically private: enabling high performance CMPs through compiler-assisted data classification

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Complexity-effective multicore coherence

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Towards efficient dynamic LLC home bank mapping with noc-level support

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data access latency, a limiting factor in the performance of chip multiprocessors, grows significantly with the number of cores in non-uniform cache architectures with distributed cache banks. To mitigate this effect, it is necessary to leverage the data access locality and choose an optimum data placement. Achieving this is especially challenging when other constraints such as cache capacity, coherence messages and runtime overhead need to be considered. This paper presents a compiler-based approach used for analyzing data access behavior in multi-threaded applications. The proposed experimental compiler framework employs novel compilation techniques to discover and represent multi-threaded memory access patterns (MMAPs). At run time, symbolic MMAPs are resolved and used by a partitioning algorithm to choose a partition of allocated memory blocks among the forked threads in the analyzed application. This partition is used to enforce data ownership by associating the data with the core that executes the thread owning the data. We demonstrate how this information can be used in an experimental architecture to accelerate applications. In particular, our compiler assisted approach shows a 20% speedup over shared caching and 5% speedup over the closest runtime approximation, "first touch".