Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Simple but effective techniques for NUMA memory management
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Data optimization: allocation of arrays to reduce communication on SIMD machines
Journal of Parallel and Distributed Computing - Massively parallel computation
Distributed Shared Memory: A Survey of Issues and Algorithms
Computer - Distributed computing systems: separate resources acting as one
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Automatic array alignment in data-parallel programs
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
The detection and elimination of useless misses in multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A singular loop transformation framework based on non-singular matrices
International Journal of Parallel Programming
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler cache optimizations for banded matrix problems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Compiler techniques for data partitioning of sequentially iterated parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Solving Alignment Using Elementary Linear Algebra
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Compiler cache optimizations for banded matrix problems
ICS '95 Proceedings of the 9th international conference on Supercomputing
The influence of caches on the performance of heaps
Journal of Experimental Algorithmics (JEA)
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Non-singular data transformations: definition, validity and applications
ICS '97 Proceedings of the 11th international conference on Supercomputing
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory data organization for improved cache performance in embedded processor applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A hyperplane based approach for optimizing spatial locality in loop nests
ICS '98 Proceedings of the 12th international conference on Supercomputing
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
An integer linear programming approach for optimizing cache locality
ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Nonsingular Data Transformations: Definition, Validity, and Applications
International Journal of Parallel Programming
A fuzzy approach to automatic data locality optimization
SAC '96 Proceedings of the 1996 ACM symposium on Applied Computing
Locality optimizations for multi-level caches
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Co-design of interleaved memory systems
CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
Automated cache optimizations using CME driven diagnosis
Proceedings of the 14th international conference on Supercomputing
The design and use of simplepower: a cycle-accurate energy estimation tool
Proceedings of the 37th Annual Design Automation Conference
Tuning Compiler Optimizations for Simultaneous Multithreading
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Data Locality Exploitation in the Decomposition of Regular Domain Problems
IEEE Transactions on Parallel and Distributed Systems
A compiler technique for improving whole-program locality
POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Compiler-directed selection of dynamic memory layouts
Proceedings of the ninth international symposium on Hardware/software codesign
A dynamic locality optimization algorithm for linear algebra codes
Proceedings of the 2001 ACM symposium on Applied computing
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Static and Dynamic Locality Optimizations Using Integer Linear Programming
IEEE Transactions on Parallel and Distributed Systems
Data Relation Vectors: A New Abstraction for Data Optimizations
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles
IEEE Transactions on Parallel and Distributed Systems
Integrating loop and data transformations for global optimization
Journal of Parallel and Distributed Computing
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets
The Journal of Supercomputing
Precise Data Locality Optimization of Nested Loops
The Journal of Supercomputing
Data-Centric Transformations for Locality Enhancement
International Journal of Parallel Programming
A Layout-Conscious Iteration Space Transformation Technique
IEEE Transactions on Computers
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
High Performance Numerical Computing in Java: Language and Compiler Issues
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Data Sequence Locality: A Generalization of Temporal Locality
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Improving Cache Effectiveness through Array Data Layout Manipulation in SAC
IFL '00 Selected Papers from the 12th International Workshop on Implementation of Functional Languages
MARS: A Distributed Memory Approach to Shared Memory Compilation
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Array Unification: A Locality Optimization Technique
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Practical parallel computing
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
Array Placement for Storage Size Reduction in Embedded Multimedia Systems
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Strategies for Improving Data Locality in Embedded Applications
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
IEEE Transactions on Parallel and Distributed Systems
Tiling, Block Data Layout, and Memory Hierarchy Performance
IEEE Transactions on Parallel and Distributed Systems
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications
IEEE Transactions on Computers
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Array Composition and Decomposition for Optimizing Embedded Applications
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Generalized Data Transformations for Enhancing Cache Behavior
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Quasidynamic Layout Optimizations for Improving Data Locality
IEEE Transactions on Parallel and Distributed Systems
Optimizing Address Code Generation for Array-Intensive DSP Applications
Proceedings of the international symposium on Code generation and optimization
The Potential of Computation Regrouping for Improving Locality
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Locality-conscious workload assignment for array-based computations in MPSOC architectures
Proceedings of the 42nd annual Design Automation Conference
Improving whole-program locality using intra-procedural and inter-procedural transformations
Journal of Parallel and Distributed Computing
The Journal of Supercomputing
Lightweight reference affinity analysis
Proceedings of the 19th annual international conference on Supercomputing
Multi-compilation: capturing interactions among concurrently-executing applications
Proceedings of the 3rd conference on Computing frontiers
2D data locality: definition, abstraction, and application
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Integrating loop and data optimizations for locality within a constraint network based framework
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Locality optimization in wireless applications
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Fast indexing for blocked array layouts to reduce cache misses
International Journal of High Performance Computing and Networking
Trade-offs in loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications
Journal of Signal Processing Systems
ACM Transactions on Embedded Computing Systems (TECS)
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
A grid-based programming approach for distributed linear algebra applications
Multiagent and Grid Systems
Code scheduling for optimizing parallelism and data locality
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
Polyhedral Model Based Data Locality Optimization for Embedded Applications
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
A 0-1 integer linear programming based approach for global locality optimizations
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Journal of Parallel and Distributed Computing
Optimizing data locality using array tiling
Proceedings of the International Conference on Computer-Aided Design
Empirical performance-model driven data layout optimization
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Embedded Systems Design
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip
ACM Transactions on Embedded Computing Systems (TECS)
Improving last level cache locality by integrating loop and data transformations
Proceedings of the International Conference on Computer-Aided Design
Reshaping cache misses to improve row-buffer locality in multicore systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.01 |
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations involve changing the execution order of programs. We have developed new techniques for compiler optimizations for distributed shared-memory machines, although the same techniques can be used for sequential machines with a memory hierarchy.Our compiler optimizations are based on an algebraic representation of data mappings and a new data locality model. We present a pure data transformation algorithm and an algorithm unifying data and control transformations. While there has been much work on control transformations, the opportunities for data transformations have been largely neglected. In fact, data transformations have the advantage of being applicable to programs that cannot be optimized with control transformations. The unified algorithm, which performs data and control transformations simultaneously, offers improvement over optimizations obtained by applying data and control transformations separately.The experimental results using a set of applications on a parallel machine show that the new optimizations improve performance significantly. These results are further analyzed using locality metrics with instrumentation and simulation.