MediaBreeze: a decoupled architecture for accelerating multimedia applications
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Three-dimensional memory vectorization for high bandwidth media memory systems
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
The CSI multimedia architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Implications of Executing Compression and Encryption Applications on General Purpose Processors
IEEE Transactions on Computers
Scientific applications vs. SPEC-FP: a comparison of program behavior
Proceedings of the 20th annual international conference on Supercomputing
Vector processing as a soft-core CPU accelerator
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Vector Processing as a Soft Processor Accelerator
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.01 |
Abstract: General-purpose microprocessors augmented with SIMD execution units enhance multimedia applications by exploiting data level parallelism. However, supporting/ overhead related instructions (instructions necessary to feed the SIMD execution units such as address generation, packing/unpacking, permute, loads/stores, and loop branches) dominate media instruction streams accounting for 75-85% of the dynamic instructions. This leads to an under-utilization of SIMD execution units resulting in a throughput that ranges between 1-12% of the peak throughput. We accelerate multimedia applications by providing explicit hardware support to eliminate or reduce the impact of the supporting/overhead related instructions. Performance evaluation shows that such hardware can significantly improve performance over conventional SIMD enhanced general-purpose processors (1.05x to 28x). In this paper, we investigate the cost of incorporating hardware, for efficient execution of supporting/ overhead related instructions, into a high-speed SIMD enhanced general-purpose processor and perform area, power, and timing tradeoffs. Our results indicate that - the added hardware requires less than 10% SIMD execution units' chip area and 0.3% overall chip area, and power consumption is less than 1% of the total processor power. This is achieved without elongating the critical path of the processor.