Quality-driven design in the system-on-a-chip era: why and how?
Journal of Systems Architecture: the EUROMICRO Journal - Modern methods and tools in digital system design
High-Level Synthesis of Nonprogrammable Hardware Accelerators
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
A Scalable Architecture for LDPC Decoding
Proceedings of the conference on Design, automation and test in Europe - Volume 3
Optimized Generation of Data-Path from C Codes for FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Low-Power High-Level Synthesis for Nanoscale CMOS Circuits
Low-Power High-Level Synthesis for Nanoscale CMOS Circuits
High-throughput layered decoder implementation for quasi-cyclic LDPC codes
IEEE Journal on Selected Areas in Communications - Special issue on capaciyy approaching codes
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
Massively LDPC Decoding on Multicore Architectures
IEEE Transactions on Parallel and Distributed Systems
LegUp: high-level synthesis for FPGA-based processor/accelerator systems
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
GNLS: a hybrid on-chip communication architecture for SoC designs
International Journal of High Performance Systems Architecture
Communication on the Fly for Hierarchical Systems of Chip Multi-processors
PARELEC '11 Proceedings of the 2011 Sixth International Symposium on Parallel Computing in Electrical Engineering
Good error-correcting codes based on very sparse matrices
IEEE Transactions on Information Theory
FPGA Pipeline Synthesis Design Exploration Using Module Selection and Resource Sharing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
High-Level Synthesis for FPGAs: From Prototyping to Deployment
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A multi-processor NoC-based architecture for real-time image/video enhancement
Journal of Real-Time Image Processing
Design of massively parallel hardware multi-processors for highly-demanding embedded applications
Microprocessors & Microsystems
ASAM: Automatic architecture synthesis and application mapping
Microprocessors & Microsystems
Hi-index | 0.00 |
This paper is devoted to the design of communication and memory architectures of massively parallel hardware multiprocessors necessary for the implementation of highly demanding applications. We demonstrated that for the massively parallel hardware multiprocessors the traditionally used flat communication architectures and multi-port memories do not scale well, and the memory and communication network influence on both the throughput and circuit area dominates the processors influence. To resolve the problems and ensure scalability, we proposed to design highly optimized application-specific hierarchical and/or partitioned communication and memory architectures through exploring and exploiting the regularity and hierarchy of the actual data flows of a given application. Furthermore, we proposed some data distribution and related data mapping schemes in the shared (global) partitioned memories with the aim to eliminate the memory access conflicts, as well as, to ensure that our communication design strategies will be applicable. We incorporated these architecture synthesis strategies into our quality-driven model-based multi-processor design method and related automated architecture exploration framework. Using this framework, we performed a large series of experiments that demonstrate many various important features of the synthesized memory and communication architectures. They also demonstrate that our method and related framework are able to efficiently synthesize well scalable memory and communication architectures even for the high-end multiprocessors. The gains as high as 12-times in performance and 25-times in area can be obtained when using the hierarchical communication networks instead of the flat networks. However, for the high parallelism levels only the partitioned approach ensures the scalability in performance.