PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Generalized dominators and post-dominators
POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Efficient accommodation of may-alias information in SSA form
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Sparse functional stores for imperative programs
IR '95 Papers from the 1995 ACM SIGPLAN workshop on Intermediate representations
Connection analysis: a practical interprocedural heap analysis for C
International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Register promotion in C programs
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Putting pointer analysis to work
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Partial redundancy elimination in SSA form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Extended SSA Numbering: Introducing SSA Properties to Language with Multi-level Pointers
CC '98 Proceedings of the 7th International Conference on Compiler Construction
Effective Representation of Aliases and Indirect Memory Operations in SSA Form
CC '96 Proceedings of the 6th International Conference on Compiler Construction
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Programmer specified pointer independence
MSP '04 Proceedings of the 2004 workshop on Memory system performance
SOMA: a tool for synthesizing and optimizing memory accesses in ASICs
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Storage assignment during high-level synthesis for configurable architectures
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Memory access optimization of dynamic binary translation for reconfigurable architectures
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Reducing control overhead in dataflow architectures
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Hi-index | 0.00 |
In this paper we present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-based representation for memory, which compactly summarizes both control-flow-and dependence information.In CASH, memory optimization is a four-step process: (1)first an initial, relatively coarse, representation of memory dependences is built; (2) next, unnecessary memory dependences are removed using dependence tests; (3) third, redundant memory operations are removed (4)finally, parallelism is increased by pipelining memory accesses in loops. While the first three steps above are very general, the loop pipelining transformations are particularly applicable for spatial computation, which is the primary target of CASH.The redundant memory removal optimizations presented are: load/store hoisting (subsuming partial redundancy elimination and common-subexpression elimination), load-after-store removal, store-before-store removal (dead store removal) and loop-invariant load motion.One of our loop pipelining transformations is a new form of loop parallelization, called loop decoupling. This transformation separates independent memory accesses within a loop body into several independent loops, which are allowed dynamically to slip with respect to each other. A new computational primitive, a token generator is used to dynamically control the amount of slip, allowing maximum freedom, while guaranteeing that no memory dependences are violated.