Efficient control generation for mapping nested loop programs onto processor arrays

Authors:
Hritam Dutta;Frank Hannig;Holger Ruckdeschel;Jürgen Teich
Affiliations:
Department of Computer Science 12, Hardware-Software-Co-Design, University of Erlangen-Nuremberg, Am Weichselgarten 3, 91058 Erlangen, Bayern, Germany;Department of Computer Science 12, Hardware-Software-Co-Design, University of Erlangen-Nuremberg, Am Weichselgarten 3, 91058 Erlangen, Bayern, Germany;Department of Computer Science 12, Hardware-Software-Co-Design, University of Erlangen-Nuremberg, Am Weichselgarten 3, 91058 Erlangen, Bayern, Germany;Department of Computer Science 12, Hardware-Software-Co-Design, University of Erlangen-Nuremberg, Am Weichselgarten 3, 91058 Erlangen, Bayern, Germany
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2007

Citing 10
Cited 3

Control generation in the design of processor arrays

Journal of VLSI Signal Processing Systems - Parallel processing on VLSI arrays
Partitioning Processor Arrays under Resource Constraints

Journal of VLSI Signal Processing Systems
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Constructing and exploiting linear schedules with prescribed parallelism

ACM Transactions on Design Automation of Electronic Systems (TODAES)
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Baring It All to Software: Raw Machines

Computer
Loop Parallelization in the Polytope Model

CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
Hierarchical Partitioning for Piecewise Linear Algorithms

PARELEC '06 Proceedings of the international symposium on Parallel Computing in Electrical Engineering
Controller synthesis for mapping partitioned programs on array architectures

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Hierarchical algorithm partitioning at system level for an improved utilization of memory structures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

PARO: Synthesis of Hardware Accelerators for Multi-dimensional Dataflow-Intensive Applications

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
A holistic approach for tightly coupled reconfigurable parallel processors

Microprocessors & Microsystems
A direct method for optimal VLSI realization of deeply nested n-D loop problems

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Processor array architectures are optimal platforms for computationally intensive applications. Such architectures are characterized by hierarchies of parallelism and memory structures, i.e. processor arrays apart from different levels of cache have a large number of processing elements (PE) where each PE can further contain sub-word parallelism. In order to handle large scale problems, balance local memory requirements with I/O-bandwidth, and use different hierarchies of parallelism and memory, one needs a sophisticated transformation called hierarchical partitioning. Innately the applications are data flow dominant and have almost no control flow, but the application of hierarchical partitioning techniques has the disadvantage of a more complex control flow. In a previous paper, the authors presented first time a methodology for the automated control path synthesis for the mapping of partitioned algorithms onto processor arrays. However, the control path contained complex multiplication and division operators. In this paper, we propose a significant extension to the methodology which reduces the hardware cost of the global controller and memory address generators by avoiding these costly operations.