Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
A framework for determining useful parallelism
ICS '88 Proceedings of the 2nd international conference on Supercomputing
A technique for summarizing data access and its use in parallelism enhancing transformations
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Array privatization for parallel execution of loops
ICS '92 Proceedings of the 6th international conference on Supercomputing
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Eliminating synchronization bottlenecks in object-based programs using adaptive replication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Compiler-generated communication for pipelined FPGA applications
Proceedings of the 40th annual Design Automation Conference
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
Code transformations for embedded reconfigurable computing architectures
GTTSE'09 Proceedings of the 3rd international summer school conference on Generative and transformational techniques in software engineering III
Parallel replication-based points-to analysis
CC'12 Proceedings of the 21st international conference on Compiler Construction
Hi-index | 0.00 |
Configurable architectures, with multiple independent on-chip RAM modules, offer the unique opportunity to exploit inherent parallel memory accesses in a sequential program by not only tailoring the number and configuration of the modules in the resulting hardware design but also the accesses to them. In this paper we explore the possibility of array replication for loop computations that is beyond the reach of traditional privatization and parallelization analyses. We present a compiler analysis that identifies portions of array variables that can be temporarily replicated within the execution of a given loop iteration, enabling the concurrent execution of statements or even non-perfectly nested loops. For configurable architectures where array replication is essentially free in terms of execution time, this replication enables not only parallel execution but also reduces or even eliminates memory contention. We present preliminary experiments applying the proposed technique to hardware designs for commercially available FPGA devices.