Optimizing memory accesses for spatial computation

Authors:
Mihai Budiu;Seth C. Goldstein
Affiliations:
Carnegie Mellon University;Carnegie Mellon University
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2003

Citing 14
Cited 7

The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Beyond induction variables

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Generalized dominators and post-dominators

POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Efficient accommodation of may-alias information in SSA form

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Sparse functional stores for imperative programs

IR '95 Papers from the 1995 ACM SIGPLAN workshop on Intermediate representations
Connection analysis: a practical interprocedural heap analysis for C

International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Register promotion in C programs

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Putting pointer analysis to work

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Partial redundancy elimination in SSA form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Extended SSA Numbering: Introducing SSA Properties to Language with Multi-level Pointers

CC '98 Proceedings of the 7th International Conference on Compiler Construction
Effective Representation of Aliases and Indirect Memory Operations in SSA Form

CC '96 Proceedings of the 6th International Conference on Compiler Construction

Spatial computation

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Programmer specified pointer independence

MSP '04 Proceedings of the 2004 workshop on Memory system performance
SOMA: a tool for synthesizing and optimizing memory accesses in ASICs

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Storage assignment during high-level synthesis for configurable architectures

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Memory access optimization of dynamic binary translation for reconfigurable architectures

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Reducing control overhead in dataflow architectures

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-based representation for memory, which compactly summarizes both control-flow-and dependence information.In CASH, memory optimization is a four-step process: (1)first an initial, relatively coarse, representation of memory dependences is built; (2) next, unnecessary memory dependences are removed using dependence tests; (3) third, redundant memory operations are removed (4)finally, parallelism is increased by pipelining memory accesses in loops. While the first three steps above are very general, the loop pipelining transformations are particularly applicable for spatial computation, which is the primary target of CASH.The redundant memory removal optimizations presented are: load/store hoisting (subsuming partial redundancy elimination and common-subexpression elimination), load-after-store removal, store-before-store removal (dead store removal) and loop-invariant load motion.One of our loop pipelining transformations is a new form of loop parallelization, called loop decoupling. This transformation separates independent memory accesses within a loop body into several independent loops, which are allowed dynamically to slip with respect to each other. A new computational primitive, a token generator is used to dynamically control the amount of slip, allowing maximum freedom, while guaranteeing that no memory dependences are violated.