Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks

Authors:
Christoforos Kozyrakis;David Patterson
Affiliations:
Stanford University;University of California at Berkeley
Venue:
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Year:
2002

Citing 12
Cited 33

Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Out-of-order vector architectures

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Vector instruction set support for conditional operations

Proceedings of the 27th annual international symposium on Computer architecture
What's next in high-performance computing?

Communications of the ACM - Ontology: different ways of representing the same concept
Tarantula: a vector extension to the alpha architecture

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
How Multimedia Workloads Will Change Processor Design

Computer
2001 Technology Roadmap for Semiconductors

Computer
Decoupled vector architectures

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Scalable vector media-processors for embedded systems

Scalable vector media-processors for embedded systems
Simultaneous Multithreaded Vector Architecture: Merging ILP and DLP for High Performance

HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing

Exploring the VLSI Scalability of Stream Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Overcoming the limitations of conventional vector processors

Proceedings of the 30th annual international symposium on Computer architecture
A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Proceedings of the 30th annual international symposium on Computer architecture
A fast parallel reed-solomon decoder on a reconfigurable architecture

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A scalable wide-issue clustered VLIW with a reconfigurable interconnect

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A new look at exploiting data parallelism in embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Multiobjective Design of Embedded Processors on FPGA Platforms

ICDCSW '04 Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7
The CSI multimedia architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The TM3270 Media-Processor

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
SCMP: a single-chip message-passing parallel computer

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
VICTORIA: VMX indirect compute technology oriented towards in-line acceleration

Proceedings of the 3rd conference on Computing frontiers
SODA: A Low-power Architecture For Software Radio

Proceedings of the 33rd annual international symposium on Computer Architecture
The potential energy efficiency of vector acceleration

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ParallAX: an architecture for real-time physics

Proceedings of the 34th annual international symposium on Computer architecture
Embracing and Extending 20th-Century Instruction Set Architectures

Computer
AsAP: A Fine-Grained Many-Core Platform for DSP Applications

IEEE Micro
Vector processing as a soft-core CPU accelerator

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
VESPA: portable, scalable, and flexible FPGA-based vector processors

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Architecture and Evaluation of an Asynchronous Array of Simple Processors

Journal of Signal Processing Systems
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Embedded DSP Processor Design: Application Specific Instruction Set Processors

Embedded DSP Processor Design: Application Specific Instruction Set Processors
AnySP: anytime anywhere anyway signal processing

Proceedings of the 36th annual international symposium on Computer architecture
Understanding throughput-oriented architectures

Communications of the ACM
Mighty-morphing power-SIMD

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Optimal synthesis of latency and throughput constrained pipelined MPSoCs targeting streaming applications

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Considerations when evaluating microprocessor platforms

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Fast parallel FFT on CTaiJi: a coarse-grained reconfigurable computation platform

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Versatile design of shared vector coprocessors for multicores

Microprocessors & Microsystems
Vector Extensions for Decision Support DBMS Acceleration

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Portable, flexible, and scalable soft vector processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Multicore-based vector coprocessor sharing for performance and energy gains

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Soft vector processors with streaming pipelines

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.02

Visualization

Abstract

Multimedia processing on embedded devices requires an architecture that leads to high performance, low power consumption, reduced design complexity, and small code size. In this paper, we use EEMBC, an industrial benchmark suite, to compare the VIRAM vector architecture to superscalar and VLIW processors for embedded multimedia applications. The comparison covers the VIRAM instruction set, vectorizing compiler, and the prototype chip that integrates a vector processor with DRAM main memory.We demonstrate that executable code for VIRAM is up to 10 times smaller than VLIW code and comparable to x86 CISC code. The simple, cache-less VIRAM chip is 2 times faster than a 4-way superscalar RISC processor that uses a 5 times faster clock frequency and consumes 10 times more power. VIRAM is also 10 times faster than cache-based VLIW processors. Even after manual optimization of the VLIW code and insertion of SIMD and DSP instructions, the single-issue VlRAM processor is 60%faster than 5-way to 8-way VLIW designs.