Cost-Effective Hardware Acceleration of Multimedia Applications

Authors:
Affiliations:
Venue:
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Year:
2001

Citing 0
Cited 9

MediaBreeze: a decoupled architecture for accelerating multimedia applications

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Three-dimensional memory vectorization for high bandwidth media memory systems

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

IEEE Transactions on Computers
The CSI multimedia architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Implications of Executing Compression and Encryption Applications on General Purpose Processors

IEEE Transactions on Computers
Scientific applications vs. SPEC-FP: a comparison of program behavior

Proceedings of the 20th annual international conference on Supercomputing
Vector processing as a soft-core CPU accelerator

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Vector Processing as a Soft Processor Accelerator

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

Parallel Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Abstract: General-purpose microprocessors augmented with SIMD execution units enhance multimedia applications by exploiting data level parallelism. However, supporting/ overhead related instructions (instructions necessary to feed the SIMD execution units such as address generation, packing/unpacking, permute, loads/stores, and loop branches) dominate media instruction streams accounting for 75-85% of the dynamic instructions. This leads to an under-utilization of SIMD execution units resulting in a throughput that ranges between 1-12% of the peak throughput. We accelerate multimedia applications by providing explicit hardware support to eliminate or reduce the impact of the supporting/overhead related instructions. Performance evaluation shows that such hardware can significantly improve performance over conventional SIMD enhanced general-purpose processors (1.05x to 28x). In this paper, we investigate the cost of incorporating hardware, for efficient execution of supporting/ overhead related instructions, into a high-speed SIMD enhanced general-purpose processor and perform area, power, and timing tradeoffs. Our results indicate that - the added hardware requires less than 10% SIMD execution units' chip area and 0.3% overall chip area, and power consumption is less than 1% of the total processor power. This is achieved without elongating the critical path of the processor.