Control generation in the design of processor arrays
Journal of VLSI Signal Processing Systems - Parallel processing on VLSI arrays
Partitioning Processor Arrays under Resource Constraints
Journal of VLSI Signal Processing Systems
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Constructing and exploiting linear schedules with prescribed parallelism
ACM Transactions on Design Automation of Electronic Systems (TODAES)
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Loop Parallelization in the Polytope Model
CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
Hierarchical Partitioning for Piecewise Linear Algorithms
PARELEC '06 Proceedings of the international symposium on Parallel Computing in Electrical Engineering
Controller synthesis for mapping partitioned programs on array architectures
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Hierarchical algorithm partitioning at system level for an improved utilization of memory structures
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
PARO: Synthesis of Hardware Accelerators for Multi-dimensional Dataflow-Intensive Applications
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
A holistic approach for tightly coupled reconfigurable parallel processors
Microprocessors & Microsystems
A direct method for optimal VLSI realization of deeply nested n-D loop problems
Microprocessors & Microsystems
Hi-index | 0.00 |
Processor array architectures are optimal platforms for computationally intensive applications. Such architectures are characterized by hierarchies of parallelism and memory structures, i.e. processor arrays apart from different levels of cache have a large number of processing elements (PE) where each PE can further contain sub-word parallelism. In order to handle large scale problems, balance local memory requirements with I/O-bandwidth, and use different hierarchies of parallelism and memory, one needs a sophisticated transformation called hierarchical partitioning. Innately the applications are data flow dominant and have almost no control flow, but the application of hierarchical partitioning techniques has the disadvantage of a more complex control flow. In a previous paper, the authors presented first time a methodology for the automated control path synthesis for the mapping of partitioned algorithms onto processor arrays. However, the control path contained complex multiplication and division operators. In this paper, we propose a significant extension to the methodology which reduces the hardware cost of the global controller and memory address generators by avoiding these costly operations.