Direct parallelization of call statements
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Efficient interprocedural analysis for program parallelization and restructuring
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
PARADIGM: a compiler for automatic data distribution on multicomputers
ICS '93 Proceedings of the 7th international conference on Supercomputing
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Gated SSA-based demand-driven symbolic analysis for parallelizing compilers
ICS '95 Proceedings of the 9th international conference on Supercomputing
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
Exact versus Approximate Array Region Analyses
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Compiler-directed Data Partitioning for Multicluster Processors
Proceedings of the International Symposium on Code Generation and Optimization
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
SOS: A Software-Oriented Distributed Shared Cache Management Approach for Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Structure-driven optimizations for amorphous data-parallel programs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
A software approach for combating asymmetries of non-volatile memories
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Practically private: enabling high performance CMPs through compiler-assisted data classification
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Complexity-effective multicore coherence
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Towards efficient dynamic LLC home bank mapping with noc-level support
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
Data access latency, a limiting factor in the performance of chip multiprocessors, grows significantly with the number of cores in non-uniform cache architectures with distributed cache banks. To mitigate this effect, it is necessary to leverage the data access locality and choose an optimum data placement. Achieving this is especially challenging when other constraints such as cache capacity, coherence messages and runtime overhead need to be considered. This paper presents a compiler-based approach used for analyzing data access behavior in multi-threaded applications. The proposed experimental compiler framework employs novel compilation techniques to discover and represent multi-threaded memory access patterns (MMAPs). At run time, symbolic MMAPs are resolved and used by a partitioning algorithm to choose a partition of allocated memory blocks among the forked threads in the analyzed application. This partition is used to enforce data ownership by associating the data with the core that executes the thread owning the data. We demonstrate how this information can be used in an experimental architecture to accelerate applications. In particular, our compiler assisted approach shows a 20% speedup over shared caching and 5% speedup over the closest runtime approximation, "first touch".