Mapping data-parallel tasks onto partially reconfigurable hybrid processor architectures

Authors:
Krishna N. Vikram;Vinita Vasudevan
Affiliations:
Siemens Corporate Technology, Bangalore, India;Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai, India
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2006

Citing 19
Cited 5

Improving functional density through run-time constant propagation

FPGA '97 Proceedings of the 1997 ACM fifth international symposium on Field-programmable gate arrays
Global optimization for mapping parallel image processing tasks on distributed memory machines

Journal of Parallel and Distributed Computing
Reconfigurable computing: what, why, and implications for design automation

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A framework for reconfigurable computing: task scheduling and context management

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
Reconfigurable computing: a survey of systems and software

ACM Computing Surveys (CSUR)
JPEG 2000: Image Compression Fundamentals, Standards and Practice

JPEG 2000: Image Compression Fundamentals, Standards and Practice
Pipelined Data Parallel Algorithms-I: Concept and Modeling

IEEE Transactions on Parallel and Distributed Systems
Collection-Aware Optimum Sequencing of Operations and Closed-Form Solutions for the Distribution of a Divisible Load on Arbitrary Processor Trees

IEEE Transactions on Parallel and Distributed Systems
Modelling and Synthesis of Configuration Controllers for Dynamically Reconfigurable Logic Systems Using the DCS CAD Framework

FPL '99 Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications
HW/SW codesign techniques for dynamically reconfigurable architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Ten Reasons to Use Divisible Load Theory

Computer
Performance of reconfigurable architectures for image-processing applications

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable systems
An optimal algorithm for minimizing run-time reconfiguration delay

ACM Transactions on Embedded Computing Systems (TECS)
Divisible Load Theory: A New Paradigm for Load Scheduling in Distributed Systems

Cluster Computing
Scheduling Divisible Loads with Processor Release Times and Finite Size Buffer Capacity Constraints in Bus Networks

Cluster Computing
Operating Systems for Reconfigurable Embedded Platforms: Online Scheduling of Real-Time Tasks

IEEE Transactions on Computers
Considering Run-Time Reconfiguration Overhead in Task Graph Transformations for Dynamically Reconfigurable Architectures

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
PARLGRAN: parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Scheduling divisible loads on partially reconfigurable hardware

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

Modern development methods and tools for embedded reconfigurable systems: A survey

Integration, the VLSI Journal
A task graph execution manager for reconfigurable multi-tasking systems

Microprocessors & Microsystems
Bandwidth Management in Application Mapping for Dynamically Reconfigurable Architectures

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Methodology for Efficient Execution of SPMD Applications on Multicore Environments

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Exploiting application data-parallelism on dynamically reconfigurable architectures: placement and architectural considerations

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reconfigurable hybrid processor systems provide a flexible platform for mapping data-parallel applications, while providing considerable speedup over software implementations. However, the overhead for reconfiguration presents a significant deterrent in mapping applications onto reconfigurable hardware. Partial runtime reconfiguration is one approach to reduce the reconfiguration overhead. In this paper, we present a methodology to map data-parallel tasks onto hardware that supports partial reconfiguration. The aim is to obtain the maximum possible speedup, for a given reconfiguration time, bus speed, and computation speed. The proposed approach involves using multiple, identical but independent processing units in the reconfigurable hardware. Under nonzero reconfiguration overhead, we show that there exists an upper limit on the number of processing units that can be employed beyond which further reduction in execution time is not possible. We obtain solutions for the minimum processing time, the corresponding load distribution, and schedule for data transfer. To demonstrate the applicability of the analysis, we present the following: 1) various plots showing the variation of processing time with different parameters; 2) hardware simulations for two examples, viz., 1-D discrete wavelet transform and finite impulse response filter, targeted to Xilinx field-programmable gate arrays (FPGAs); and 3) experimental results for a hardware prototype implemented on a FPGA board.