Computer Architecture; A Quantitative Approach
Computer Architecture; A Quantitative Approach
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Overcoming the limitations of conventional vector processors
Proceedings of the 30th annual international symposium on Computer architecture
An FPGA-based VLIW processor with custom hardware execution
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
The microarchitecture of FPGA-based soft processors
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Application-specific customization of soft processor microarchitecture
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Vector Processing Support for FPGA-Oriented High Performance Applications
ISVLSI '07 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
Vector processing as a soft-core CPU accelerator
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Vector Processing as a Soft Processor Accelerator
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Fine-grain performance scaling of soft vector processors
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Building heterogeneous reconfigurable systems with a hardware microkernel
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
VEGAS: soft vector processor with scratchpad memory
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Co-synthesis of FPGA-based application-specific floating point simd accelerators
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
OCTAVO: an FPGA-centric processor family
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Accelerator compiler for the VENICE vector processor
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
A lean FPGA soft processor built using a DSP block
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Versatile design of shared vector coprocessors for multicores
Microprocessors & Microsystems
A low-overhead interconnect architecture for virtual reconfigurable fabrics
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Multicore-based vector coprocessor sharing for performance and energy gains
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Soft vector processors with streaming pipelines
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
While soft processors are increasingly common in FPGA-based embedded systems, it remains a challenge to scale their performance. We propose extending soft processor instruction sets to include support for vector processing. The resulting system of vectorized software and soft vector processor hardware is (i) portable to any FPGA architecture and vector processor configuration, (ii) scalable to larger yet higher-performance designs, and (iii) flexible, allowing the underlying vector processor to be customized to match the needs of each application. Using our robust and verified parameterized vector processor design and industry-standard EEMBC benchmarks, we evaluate the performance and area trade-offs for different soft vector processor configurations using an FPGA development platform with DDR SDRAM. We find that on average we can scale performance from 1.8x up to 6.3x for a vector processor design that saturates the capacity of our platform's Stratix 1S80 FPGA. We also automatically generate application-specific vector processors with reduced datapath width and instruction set support which combined reduce the area by up to 70% (61% on average) without affecting performance.