POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A compiler technique for improving whole-program locality
POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Efficient Parallelization using Combined Loop and Data Transformations
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Integrating loop and data optimizations for locality within a constraint network based framework
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Trade-offs in loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Cache topology aware computation mapping for multicores
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
A framework for automatic parallelization, static and dynamic memory optimization in MPSoC platforms
Proceedings of the 47th Design Automation Conference
Automatic Loop Tiling for Direct Memory Access
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Optimizing Data Layouts for Parallel Computation on Multicores
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Optimizing data locality using array tiling
Proceedings of the International Conference on Computer-Aided Design
Combined loop transformation and hierarchy allocation for data reuse optimization
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
Motivated by the observation that most existing data locality optimizations do not specifically target shared last-level caches of emerging multicores and that even multicore-specific locality-oriented techniques employ either loop or data layout optimizations but not both, in this paper we present an integrated loop and data layout optimization strategy, with the goal of improving the last-level cache performance of multicores that execute multithreaded applications. We present a detailed mathematical formulation of our locality optimization strategy and present experimental data from our current implementation. Our results, collected using 14 application programs, clearly show that the proposed integrated approach is very successful in practice, and outperforms both pure loop optimization and pure data layout optimization based alternatives. Our results also indicate that the savings achieved increase with increased core count and larger data set sizes.