Stencils and problem partitionings: their influence on the performance of multiple processor systems
IEEE Transactions on Computers
ICS '88 Proceedings of the 2nd international conference on Supercomputing
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Array-data flow analysis and its use in array privatization
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A singular loop transformation framework based on non-singular matrices
International Journal of Parallel Programming
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Microparallelism and high-performance protein matching
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Interprocedural array region analyses
International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Plugging anti and output dependence removal techniques into loop parallelization algorithm
Parallel Computing - Special double issue on environment and tools for parallel scientific computing
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimizing programs by data and control transformations
Optimizing programs by data and control transformations
Automatic storage management for parallel programs
Parallel Computing - Special issues on languages and compilers for parallel computers
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
An Exact Method for Analysis of Value-based Array Data Dependences
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
CROPS: coordinated restructuring of programs and storage
ACM SIGSOFT Software Engineering Notes
Optimizing memory usage in the polyhedral model
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
A unified framework for schedule and storage optimization
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Storage Mapping Optimization for Parallel Programs
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Lattice-based memory allocation
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Applications of storage mapping optimization to register promotion
Proceedings of the 18th annual international conference on Supercomputing
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling for memory emergency
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Improving whole-program locality using intra-procedural and inter-procedural transformations
Journal of Parallel and Distributed Computing
Lattice-Based Memory Allocation
IEEE Transactions on Computers
Energy-aware computation duplication for improving reliability in embedded chip multiprocessors
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Compiler-directed selective data protection against soft errors
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A step towards unifying schedule and storage optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Code-size conscious pipelining of imperfectly nested loops
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Efficient Method for Periodic Task Scheduling with Storage Requirement Minimization
COCOA 2008 Proceedings of the 2nd international conference on Combinatorial Optimization and Applications
Reducing memory requirements of resource-constrained applications
ACM Transactions on Embedded Computing Systems (TECS)
Periodic register saturation in innermost loops
Parallel Computing
Loop transformations for reducing data space requirements of resource-constrained applications
SAS'03 Proceedings of the 10th international conference on Static analysis
Early control of register pressure for software pipelined loops
CC'03 Proceedings of the 12th international conference on Compiler construction
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
SIRALINA: efficient two-steps heuristic for storage optimisation in single period task scheduling
Journal of Combinatorial Optimization
Memory space conscious loop iteration duplication for reliable execution
SAS'05 Proceedings of the 12th international conference on Static Analysis
Hi-index | 0.00 |
This paper studies the relationship between storage requirements and performance. Storage-related dependences inhibit optimizations for locality and parallelism. Techniques such as renaming and array expansion can eliminate all storage-related dependences, but do so at the expense of increased storage. This paper introduces the universal occupancy vector (UOV) for loops with a regular stencil of dependences. The UOV provides a schedule-independent storage reuse pattern that introduces no further dependences (other than those implied by true flow dependences). OV-mapped code requires less storage than full array expansion and only slightly more storage than schedule-dependent minimal storage.We show that determine if a vector is a UOV is NPcomplete. However, an easily constructed but possibly nonminimal UOV can be used. We also present a branch and bound algorithm which finds the minimal UOV, while still maintaining a legal UOV at all times.Our experimental results show that the use of OV-mapped storage, coupled with tiling for locality, achieves better performance than tiling after array expansion, and accommodates larger problem sizes than untilable, storage-optimized code. F'urthermore, storage mapping based on the UOV introduces negligible runtime overhead.