Implementing Precise Interrupts in Pipelined Processors
IEEE Transactions on Computers
MIPS RISC architecture
Strip mining on SIMD architectures
ICS '91 Proceedings of the 5th international conference on Supercomputing
Vector architectures: past, present and future
ICS '98 Proceedings of the 12th international conference on Supercomputing
Simple vector microprocessors for multimedia applications
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An ISA Comparison Between Superscalar and Vector Processors
VECPAR '98 Selected Papers and Invited Talks from the Third International Conference on Vector and Parallel Processing
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Vector microprocessors
Scalable vector media-processors for embedded systems
Scalable vector media-processors for embedded systems
Compiling for EDGE Architectures
Proceedings of the International Symposium on Code Generation and Optimization
Vector-thread architecture and implementation
Vector-thread architecture and implementation
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
FPGA Prototyping ByVerilog Examples
FPGA Prototyping ByVerilog Examples
Design and Implementation of a 64-bit RISC Processor Using VHDL
UKSIM '09 Proceedings of the UKSim 2009: 11th International Conference on Computer Modelling and Simulation
Low-complexity vector microprocessor extension
Low-complexity vector microprocessor extension
POWER4 system microarchitecture
IBM Journal of Research and Development
Simplified vector-thread architectures for flexible and efficient data-parallel accelerators
Simplified vector-thread architectures for flexible and efficient data-parallel accelerators
Computer Architecture, Fifth Edition: A Quantitative Approach
Computer Architecture, Fifth Edition: A Quantitative Approach
Computer Organization and Design, Revised Fourth Edition, Fourth Edition: The Hardware/Software Interface
Hi-index | 0.00 |
This paper proposes a low-complexity vector-core called LcVc for executing both scalar and vector instructions on the same execution datapath. A unified register file in the decode stage is used for storing both scalar operands and vector elements. The execution stage accepts a new set of operands each cycle and produces a new result. Rather than issuing a vector instruction (1-D operations) as a whole, each vector operation is issued sequentially with the existing scalar issue hardware. In the first implementation of LcVc, all loads and stores of registers take place from the data cache in the memory access stage in a rate of one element per clock cycle. The complete design of our proposed LcVc processor is implemented using VHDL targeting the Xilinx FPGA Spartan 3E, xc3s1600e-4-fg320 device. The total number of slices required for implementing LcVc is 1778, where the number of slice flip-flops is 538 and the number of 4-input LUTs is 3706: 1914 for logic and 1792 for RAMs. Moreover, our performance evaluation results show that the speedup of executing vector addition, vector scaling, SAXPY, and matrix-matrix multiplication on LcVc over the scalar execution are 2.3, 2.5, 1.9, and 3, respectively. The hardware required to support the enhanced vector capability is insignificant (5%), which results in reducing the area per core and increasing the number of cores available in a given chip area.