Computer Architecture; A Quantitative Approach
Computer Architecture; A Quantitative Approach
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Vector microprocessors
An FPGA-based VLIW processor with custom hardware execution
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
The microarchitecture of FPGA-based soft processors
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
FPGA-Based Vector Processing for Solving Sparse Sets of Equations
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Application-specific customization of soft processor microarchitecture
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
A Multithreaded Soft Processor for SoPC Area Reduction
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Runtime Partial Reconfiguration for Embedded Vector Processors
ITNG '07 Proceedings of the International Conference on Information Technology
Vector processing as a soft-core CPU accelerator
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
VESPA: portable, scalable, and flexible FPGA-based vector processors
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Scaling Soft Processor Systems
FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
Application Specific Customization and Scalability of Soft Multiprocessors
FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
VEGAS: soft vector processor with scratchpad memory
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
A pipeline interleaved heterogeneous SIMD soft processor array architecture for MIMO-OFDM detection
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Portable, flexible, and scalable soft vector processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Soft vector processors with streaming pipelines
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Hi-index | 0.00 |
Embedded systems are often implemented on FPGA devices and 25% of the time include a soft processor--a processor built using the FPGA reprogrammable fabric. Because of their prevalence and flexibility, soft processors are compelling targets for customization--although current soft processors provide few architectural variations. Recent work has proposed augmenting soft processors with customizable vector processing support, enabling designers to easily scale performance by exploiting the data parallelism available in an application. However this approach provides only coarse-grain scaling, by successively doubling the number of vector datapaths for less than double the performance. In this work we further augment soft vector processors with more fine-grain architectural modifications: we add support for (i) vector chaining and (ii) heterogeneous vector lanes, allowing the soft vector processor to be customized to not only the data-level parallelism available in an application, but to the functional unit demand. We evaluate the area and wall clock performance with full hardware implementations on state-of-the-art FPGAs and find that chaining can provide between 15-45% average performance for less area than doubling the lanes, and that heterogeneous lanes can save 6-13% area with little or no performance loss in some cases. Finally, we implement 1200 soft vector processors variants and find that the peak performance per area compared to our base vector processor can be increased by an average of 13% and up to 34% when choosing the best variant per application.