A Heterogeneous Multiprocessor Architecture for Flexible Media Processing
IEEE Design & Test
An FPGA Co-processor for Real-Time Visual Tracking
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Multiprocessor Architectures for Embedded System-on-chip Applications
VLSID '04 Proceedings of the 17th International Conference on VLSI Design
Run-Time Services for Hybrid CPU/FPGA Systems on Chip
RTSS '06 Proceedings of the 27th IEEE International Real-Time Systems Symposium
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
AHS '07 Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems
A parallel hardware architecture for connected component labeling based on fast label merging
ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
A modular coprocessor architecture for embedded real-time image and video signal processing
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Hi-index | 0.00 |
This paper presents a synchronization framework for parallel computing heterogeneous processing elements, which are controlled by a RISC processor. The communication delay between RISC and processing elements is a key issue if the RISC is not closely attached to the processing elements. Recent synchronization approaches neglect communication delays or require low communication delays. This results in a low synchronization rate between RISC and PEs. In order to overcome this delay, a special hardware-based synchronization approach is proposed that reduces the communication overhead and increases the number of executable tasks per time unit. Further, it supports parallel execution of independent hardware tasks. The approach was evaluated for a modular coprocessor architecture containing several processing elements for image processing tasks. The coarse-grained parallel execution of independent tasks significantly improves the speed of an exemplary application for aerial image based vehicle detection on straight highways.