ICS '88 Proceedings of the 2nd international conference on Supercomputing
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A practical algorithm for exact array dependence analysis
Communications of the ACM
A practical data flow framework for array reference analysis and its use in optimizations
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Array-data flow analysis and its use in array privatization
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Journal of Parallel and Distributed Computing
Register promotion in C programs
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Plugging anti and output dependence removal techniques into loop parallelization algorithm
Parallel Computing - Special double issue on environment and tools for parallel scientific computing
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Array SSA form and its use in parallelization
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Parameterized polyhedra and their vertices
International Journal of Parallel Programming
Advanced compiler design and implementation
Advanced compiler design and implementation
Automatic storage management for parallel programs
Parallel Computing - Special issues on languages and compilers for parallel computers
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Load-reuse analysis: design and evaluation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Optimizing memory usage in the polyhedral model
ACM Transactions on Programming Languages and Systems (TOPLAS)
A unified framework for schedule and storage optimization
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
A compiler approach to fast hardware design space exploration in FPGA-based systems
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Register tiling in nonrectangular iteration spaces
ACM Transactions on Programming Languages and Systems (TOPLAS)
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Precise Data Locality Optimization of Nested Loops
The Journal of Supercomputing
Adaptive Optimizing Compilers for the 21st Century
The Journal of Supercomputing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Storage Mapping Optimization for Parallel Programs
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Reordering and storage optimizations for scientific programs
Reordering and storage optimizations for scientific programs
Improving data locality by chunking
CC'03 Proceedings of the 12th international conference on Compiler construction
Efficient code generation for automatic parallelization and optimization
ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Evaluating iterative compilation
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Code-size conscious pipelining of imperfectly nested loops
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Constructing application-specific memory hierarchies on FPGAs
Transactions on high-performance embedded architectures and compilers III
Hi-index | 0.00 |
Storage mapping optimization is a flexible approach to folding array dimensions in numerical codes. It is designed to reduce the memory footprint after a wide spectrum of loop transformations, whether based on uniform dependence vectors or more expressive polyhedral abstractions. Conversely, few loop transformations have been proposed to facilitate register promotion, namely loop fusion, unroll-and-jam or tiling. Building on array data-flow analysis and expansion, we extend storage mapping optimization to improve opportunities for register promotion.Our work is motivated by the empirical study of a computational biology benchmark, the approximate string matching algorithm BPR from NR-grep, on a wide issue micro-architecture. Our experiments confirm the major benefit of register tiling (even on non-numerical benchmarks) but also shed the light on two novel issues: prior array expansion may be necessary to enable loop transformations that finally authorize profitable register promotion, and more advanced scheduling techniques (beyond tiling and unroll-and-jam) may significantly improve performance in fine-tuning register usage and instruction-level parallelism.