Efficient control generation for mapping nested loop programs onto processor arrays
Journal of Systems Architecture: the EUROMICRO Journal
PARO: Synthesis of Hardware Accelerators for Multi-dimensional Dataflow-Intensive Applications
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
A holistic approach for tightly coupled reconfigurable parallel processors
Microprocessors & Microsystems
Parallelization Approaches for Hardware Accelerators --- Loop Unrolling Versus Loop Partitioning
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Efficient Data Access Management for FPGA-Based Image Processing SoCs
RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
Exploration of 3D grid caching strategies for ray-shooting
Journal of Real-Time Image Processing
Hi-index | 0.00 |
Processor arrays are used as accelerators for plenty of data flow-dominant applications. The explosive growth in research and development of massively parallel processor array architectures has lead to demand for mapping tools to realize the full potential of these architectures. Such architectures are characterized by hierarchies of parallelism and memory structures, i.e. processor array apart from different levels of cache arrays have a number of processing elements (PE) where each PE can further contain sub-word parallelism. In order to handle large scale problems, balance local memory requirements with I/O-bandwidth, and use different hierarchies of parallelism and memory, one needs a sophisticated transformation called hierarchical partitioning. In this paper, we introduce for the first time a detailed methodology encompassing hierarchical partitioning.