Combined instruction and loop parallelism in array synthesis for FPGAs
Proceedings of the 14th international symposium on Systems synthesis
ShiftQ: a bufferred interconnect for custom loop accelerators
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Constructing and exploiting linear schedules with prescribed parallelism
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy/power estimation of regular processor arrays
Proceedings of the 15th international symposium on System Synthesis
Instruction generation for hybrid reconfigurable systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Embedded Computing: New Directions in Architecture and Automation
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Design Space Exploration for Massively Parallel Processor Arrays
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Complexity of Multi-dimensional Loop Alignment
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications
IEEE Transactions on Computers
New Complexity Results on Array Contraction and Related Problems
Journal of VLSI Signal Processing Systems
Master Interface for On-chip Hardware Accelerator Burst Communications
Journal of VLSI Signal Processing Systems
Journal of Parallel and Distributed Computing
A holistic approach for tightly coupled reconfigurable parallel processors
Microprocessors & Microsystems
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
On control signals for multi-dimensional time
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Scalable communication architectures for massively parallel hardware multi-processors
Journal of Parallel and Distributed Computing
Exploiting area/delay tradeoffs in high-level synthesis
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Design of massively parallel hardware multi-processors for highly-demanding embedded applications
Microprocessors & Microsystems
Hi-index | 0.00 |
The PICO-N system automatically synthesizes embedded nonprogrammable accelerators to be used as co-processors for functions expressed as loop nests in C. The output is synthesizable VHDL that defines the accelerator at the register transfer level (RTL). The system generates a synchronous array of customized VLIW (very-long instruction word) processors, their controller, local memory, and interfaces. The system also modifies the user's application software to make use of the generated accelerator. The user indicates the throughput to be achieved by specifying the number of processors and their initiation interval. In experimental comparisons, PICO-N designs are slightly more costly than hand-designed accelerators with the same performance.