Array replication to increase parallelism in applications mapped to configurable architectures

Authors:
Heidi E. Ziegler;Priyadarshini L. Malusare;Pedro C. Diniz
Affiliations:
University of Southern California / Information Sciences Institute, Marina del Rey, California;University of Southern California / Information Sciences Institute, Marina del Rey, California;University of Southern California / Information Sciences Institute, Marina del Rey, California
Venue:
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Year:
2005

Citing 11
Cited 3

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
A framework for determining useful parallelism

ICS '88 Proceedings of the 2nd international conference on Supercomputing
A technique for summarizing data access and its use in parallelism enhancing transformations

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Array privatization for parallel execution of loops

ICS '92 Proceedings of the 6th international conference on Supercomputing
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Eliminating synchronization bottlenecks in object-based programs using adaptive replication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Compiler-generated communication for pipelined FPGA applications

Proceedings of the 40th annual Design Automation Conference
Custom Data Layout for Memory Parallelism

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

Compiling for reconfigurable computing: A survey

ACM Computing Surveys (CSUR)
Code transformations for embedded reconfigurable computing architectures

GTTSE'09 Proceedings of the 3rd international summer school conference on Generative and transformational techniques in software engineering III
Parallel replication-based points-to analysis

CC'12 Proceedings of the 21st international conference on Compiler Construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Configurable architectures, with multiple independent on-chip RAM modules, offer the unique opportunity to exploit inherent parallel memory accesses in a sequential program by not only tailoring the number and configuration of the modules in the resulting hardware design but also the accesses to them. In this paper we explore the possibility of array replication for loop computations that is beyond the reach of traditional privatization and parallelization analyses. We present a compiler analysis that identifies portions of array variables that can be temporarily replicated within the execution of a given loop iteration, enabling the concurrent execution of statements or even non-perfectly nested loops. For configurable architectures where array replication is essentially free in terms of execution time, this replication enables not only parallel execution but also reduces or even eliminates memory contention. We present preliminary experiments applying the proposed technique to hardware designs for commercially available FPGA devices.