Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Communications of the ACM - Special issue on computer architecture
Computer Architecture; A Quantitative Approach
Computer Architecture; A Quantitative Approach
A pipelined configurable gate array for embedded processors
FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Compiling SA-C Programs to FPGAs: Performance Results
ICVS '01 Proceedings of the Second International Workshop on Computer Vision Systems
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Intelligent RAM (IRAM): the Industrial Setting, Applications, and Architectures
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Overcoming the limitations of conventional vector processors
Proceedings of the 30th annual international symposium on Computer architecture
Vector microprocessors
Scalable vector media-processors for embedded systems
Scalable vector media-processors for embedded systems
A high performance 32-bit ALU for programmable logic
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
An FPGA-based VLIW processor with custom hardware execution
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
The microarchitecture of FPGA-based soft processors
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Application-specific customization of soft processor microarchitecture
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A Multithreaded Soft Processor for SoPC Area Reduction
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Supporting multithreading in configurable soft processor cores
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Vector processing as a soft-core CPU accelerator
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Practical fpga programming in c
Practical fpga programming in c
Scaling Soft Processor Systems
FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
Vector Processing as a Soft Processor Accelerator
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Fine-grain performance scaling of soft vector processors
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Application Specific Customization and Scalability of Soft Multiprocessors
FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Exploration and Customization of FPGA-Based Soft Processors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A Fault Tolerant Approach for FPGA Embedded Processors Based on Runtime Partial Reconfiguration
Journal of Electronic Testing: Theory and Applications
Hi-index | 0.00 |
Field-programmable gate arrays (FPGAs) are increasingly used to implement embedded digital systems, however, the hardware design necessary to do so is time-consuming and tedious. The amount of hardware design can be reduced by employing a microprocessor for less-critical computation in the system. Often this microprocessor is implemented using the FPGA reprogrammable fabric as a soft processor which presently have simple architectures and moderate performance. Our goal is to scale the performance of existing soft processors hence expanding their suitability to more critical computation. To this end we propose extending soft processors with vector extensions to exploit the abundant data parallelism found in many embedded kernels. Such a soft vector processor can execute these kernels much faster than a single-core hence reducing the need for hardware implementations. We observe this improved execution speed through experimentation with vector extended soft processor architecture (VESPA) which is designed, implemented, and evaluated on real FPGA hardware. VESPA is shown to effectively scale performance up to 32 lanes, while providing substantial architectural flexibility to create a fine-grained design space. With these characteristics, and portability across FPGA devices, soft vector processors can provide exact-fit architectures which can efficiently and more easily implement data parallel workloads over custom FPGA hardware design.