An investigation of scalable SIMD I/O techniques with application to parallel JPEG compression
Journal of Parallel and Distributed Computing - Special issue on multimedia processing and technology
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Hardware-software co-design of embedded systems: the POLIS approach
Hardware-software co-design of embedded systems: the POLIS approach
A survey of CORDIC algorithms for FPGA based computers
FPGA '98 Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays
Engineering of Reconfigurable Hardware/Software Objects
The Journal of Supercomputing
Compiler Optimizations for Adaptive EPIC Processors
EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
Hardware/Software Co-Design for Data-Driven Xputer-based Accelerators
VLSID '97 Proceedings of the Tenth International Conference on VLSI Design: VLSI in Multimedia Applications
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A Single Program Multiple Data Parallel Processing Platform for FPGAs
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Instruction set extension with shadow registers for configurable processors
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
An FPGA-based VLIW processor with custom hardware execution
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Integration, the VLSI Journal
Hi-index | 0.00 |
Configurable multiprocessor platforms consist of multiple soft processors configured on FPGA devices. They have become an attractive choice for implementing many computing applications. In addition to the various ways of distributing software execution among the multiple soft processors, the application designer can customize soft processors and the connections between them in order to improve the performance of the applications running on the multiprocessor platform. State-of-the-art design tools rely on low-level simulation to explore the various design trade-offs offered by configurable multiprocessor platforms. These low-level simulation based exploration techniques are too time-consuming and can be a major bottleneck to efficient design space exploration on these platforms. We propose a design space exploration technique for configurable multiprocessor platforms using arithmetic-level cycle-accurate hardware--software cosimulation. Arithmetic-level abstractions of the hardware and software execution platforms are created within the proposed cosimulation environment. The configurable multiprocessor platforms are described using these arithmetic-level abstractions. Hardware and software simulators are tightly integrated to concurrently simulate the arithmetic behavior of the multiprocessor platform. The simulation within the integrated simulators are synchronized to provide cycle-accurate simulation results for the complete multiprocessor platform. By doing so, we significantly speed up the cosimulation process for configurable multiprocessor platforms. Exploration of the various hardware-software design trade-offs provided by configurable multiprocessor platforms can be performed within the proposed cycle-accurate cosimulation environment. After the final designs are identified, the corresponding low-level implementations with the desired cycle-accurate arithmetic behavior are generated automatically. For illustrative purposes, we provide an implementation of our approach based on MATLAB/Simulink. We show the cosimulation of two numerical computation applications and one image-processing application on a popular configurable multiprocessor platform within the MATLAB/Simulink-based cosimulation environment. For these three applications, our arithmetic-level cosimulation approach leads to speed-ups in simulation time of up to more than 800x compared with the low-level simulation approaches. The designs of these applications identified using our arithmetic-level cosimulation approach achieve execution time speed-ups up to 5.6x, compared with other designs considered in our experiments.