Schedule-independent storage mapping for loops

Authors:
Michelle Mills Strout;Larry Carter;Jeanne Ferrante;Beth Simon
Affiliations:
CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA;CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA;CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA;CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA
Venue:
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Year:
1998

Citing 22
Cited 27

Stencils and problem partitionings: their influence on the performance of multiple processor systems

IEEE Transactions on Computers
Array expansion

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Array-data flow analysis and its use in array privatization

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A singular loop transformation framework based on non-singular matrices

International Journal of Parallel Programming
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Microparallelism and high-performance protein matching

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Interprocedural array region analyses

International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Plugging anti and output dependence removal techniques into loop parallelization algorithm

Parallel Computing - Special double issue on environment and tools for parallel scientific computing
Maximal static expansion

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimizing programs by data and control transformations

Optimizing programs by data and control transformations
Automatic storage management for parallel programs

Parallel Computing - Special issues on languages and compilers for parallel computers
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
An Exact Method for Analysis of Value-based Array Data Dependences

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing

New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
CROPS: coordinated restructuring of programs and storage

ACM SIGSOFT Software Engineering Notes
Optimizing memory usage in the polyhedral model

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
A unified framework for schedule and storage optimization

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Storage Mapping Optimization for Parallel Programs

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Better tiling and array contraction for compiling scientific programs

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Lattice-based memory allocation

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Applications of storage mapping optimization to register promotion

Proceedings of the 18th annual international conference on Supercomputing
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling for memory emergency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Improving whole-program locality using intra-procedural and inter-procedural transformations

Journal of Parallel and Distributed Computing
Lattice-Based Memory Allocation

IEEE Transactions on Computers
Energy-aware computation duplication for improving reliability in embedded chip multiprocessors

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Compiler-directed selective data protection against soft errors

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Bee+Cl@k: an implementation of lattice-based array contraction in the source-to-source translator rose

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A step towards unifying schedule and storage optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Code-size conscious pipelining of imperfectly nested loops

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Efficient Method for Periodic Task Scheduling with Storage Requirement Minimization

COCOA 2008 Proceedings of the 2nd international conference on Combinatorial Optimization and Applications
Reducing memory requirements of resource-constrained applications

ACM Transactions on Embedded Computing Systems (TECS)
Periodic register saturation in innermost loops

Parallel Computing
Loop transformations for reducing data space requirements of resource-constrained applications

SAS'03 Proceedings of the 10th international conference on Static analysis
Early control of register pressure for software pipelined loops

CC'03 Proceedings of the 12th international conference on Compiler construction
Data locality and parallelism optimization using a constraint-based approach

Journal of Parallel and Distributed Computing
SIRALINA: efficient two-steps heuristic for storage optimisation in single period task scheduling

Journal of Combinatorial Optimization
Memory space conscious loop iteration duplication for reliable execution

SAS'05 Proceedings of the 12th international conference on Static Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the relationship between storage requirements and performance. Storage-related dependences inhibit optimizations for locality and parallelism. Techniques such as renaming and array expansion can eliminate all storage-related dependences, but do so at the expense of increased storage. This paper introduces the universal occupancy vector (UOV) for loops with a regular stencil of dependences. The UOV provides a schedule-independent storage reuse pattern that introduces no further dependences (other than those implied by true flow dependences). OV-mapped code requires less storage than full array expansion and only slightly more storage than schedule-dependent minimal storage.We show that determine if a vector is a UOV is NPcomplete. However, an easily constructed but possibly nonminimal UOV can be used. We also present a branch and bound algorithm which finds the minimal UOV, while still maintaining a legal UOV at all times.Our experimental results show that the use of OV-mapped storage, coupled with tiling for locality, achieves better performance than tiling after array expansion, and accommodates larger problem sizes than untilable, storage-optimized code. F'urthermore, storage mapping based on the UOV introduces negligible runtime overhead.