Fine-grain performance scaling of soft vector processors

Authors:
Peter Yiannacouras;J. Gregory Steffan;Jonathan Rose
Affiliations:
University of Toronto, Toronto, Canada;University of Toronto, Toronto, Canada;University of Toronto, Toronto, Canada
Venue:
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2009

Citing 14
Cited 4

Computer Architecture; A Quantitative Approach

Computer Architecture; A Quantitative Approach
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors

MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Vector microprocessors

Vector microprocessors
An FPGA-based VLIW processor with custom hardware execution

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
The microarchitecture of FPGA-based soft processors

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
FPGA-Based Vector Processing for Solving Sparse Sets of Equations

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Application-specific customization of soft processor microarchitecture

Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
A Multithreaded Soft Processor for SoPC Area Reduction

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Runtime Partial Reconfiguration for Embedded Vector Processors

ITNG '07 Proceedings of the International Conference on Information Technology
Vector processing as a soft-core CPU accelerator

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Scalable Vector Processors for Embedded Systems

IEEE Micro
VESPA: portable, scalable, and flexible FPGA-based vector processors

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Scaling Soft Processor Systems

FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
Application Specific Customization and Scalability of Soft Multiprocessors

FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines

VEGAS: soft vector processor with scratchpad memory

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
A pipeline interleaved heterogeneous SIMD soft processor array architecture for MIMO-OFDM detection

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Portable, flexible, and scalable soft vector processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Soft vector processors with streaming pipelines

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Embedded systems are often implemented on FPGA devices and 25% of the time include a soft processor--a processor built using the FPGA reprogrammable fabric. Because of their prevalence and flexibility, soft processors are compelling targets for customization--although current soft processors provide few architectural variations. Recent work has proposed augmenting soft processors with customizable vector processing support, enabling designers to easily scale performance by exploiting the data parallelism available in an application. However this approach provides only coarse-grain scaling, by successively doubling the number of vector datapaths for less than double the performance. In this work we further augment soft vector processors with more fine-grain architectural modifications: we add support for (i) vector chaining and (ii) heterogeneous vector lanes, allowing the soft vector processor to be customized to not only the data-level parallelism available in an application, but to the functional unit demand. We evaluate the area and wall clock performance with full hardware implementations on state-of-the-art FPGAs and find that chaining can provide between 15-45% average performance for less area than doubling the lanes, and that heterogeneous lanes can save 6-13% area with little or no performance loss in some cases. Finally, we implement 1200 soft vector processors variants and find that the peak performance per area compared to our base vector processor can be increased by an average of 13% and up to 34% when choosing the best variant per application.