Experimental Application-Driven Architecture Analysis of an SIMD/MIMD Parallel Processing System
IEEE Transactions on Parallel and Distributed Systems
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
An innovative low-power high-performance programmable signal processor for digital communications
IBM Journal of Research and Development
Overview of research efforts on media ISA extensions and their usage in video coding
IEEE Transactions on Circuits and Systems for Video Technology
SODA: A Low-power Architecture For Software Radio
Proceedings of the 33rd annual international symposium on Computer Architecture
Compiling for an indirect vector register architecture
Proceedings of the 5th conference on Computing frontiers
Performing real-time image processing on distributed computer systems
MUSP'10 Proceedings of the 10th WSEAS international conference on Multimedia systems & signal processing
Parallel image and video processing on distributed computer systems
WSEAS Transactions on Signal Processing
A low-power DSP for wireless communications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
This paper describes and evaluates three architectural methods for accomplishing data parallel computation in a programmable embedded system. Comparisons are made between the well-studied Very Long Instruction Word (VLIW) and Single Instruction Multiple Packed Data (SIMpD) paradigms; the less-common Single Instruction Multiple Disjoint Data (SIMdD) architecture is described and evaluated. A taxonomy is defined for data-level parallel architectures, and patterns of data access for parallel computation are studied, with measurements presented for over 40 essential telecommunication and media kernels. While some algorithms exhibit data-level parallelism suited to packed vector computation, it is shown that other kernels are most efficiently scheduled with more flexible vector models. This motivates exploration of non-traditional processor architectures for the embedded domain.