Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
High-level synthesis for the design of FPGA-based signal processing systems
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Topological Patterns for Scalable Representation and Analysis of Dataflow Graphs
Journal of Signal Processing Systems
Impact of FPGA architecture on resource sharing in high-level synthesis
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Scalable communication architectures for massively parallel hardware multi-processors
Journal of Parallel and Distributed Computing
A heuristic scheduler for port-constrained floating-point pipelines
International Journal of Reconfigurable Computing
Combining module selection and replication for throughput-driven streaming programs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Combining computation and communication optimizations in system synthesis for streaming applications
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Design of massively parallel hardware multi-processors for highly-demanding embedded applications
Microprocessors & Microsystems
Hi-index | 0.03 |
The primary goal during synthesis of digital signal processing (DSP) circuits is to minimize the hardware area while meeting a minimum throughput constraint. In field-programmable gate array (FPGA) implementations, significant area savings can be achieved by using slower, more area-efficient circuit modules and/or by time-multiplexing faster, larger circuit modules. Unfortunately, manual exploration of this design space is impractical. In this paper, we introduce a design exploration methodology that identifies the lowest cost FPGA pipelined implementation of an untimed synchronous data-flow graph by combined module selection with resource sharing under the context of pipeline scheduling. These techniques are applied together to minimize the area cost of the FPGA implementation while meeting a user-specified minimum throughput constraint. Two different algorithms are introduced for exploring the large design space. We show that even for small DSP algorithms, combining these techniques can offer significant area savings relative to applying any of them alone