Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology
ICS '99 Proceedings of the 13th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Digital Image Processing
VIS Speeds New Media Processing
IEEE Micro
An Efficient Algorithm for Out-of-Core Matrix Transposition
IEEE Transactions on Computers
Measuring the Performance of Multimedia Instruction Sets
IEEE Transactions on Computers
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Architectural enhancements for color image and video processing on embedded systems
Architectural enhancements for color image and video processing on embedded systems
Color-Aware Instructions for Embedded Superscalar Processors
Journal of Signal Processing Systems
Hi-index | 0.00 |
Application-specific extensions of a processor provide an efficient mechanism to meet the growing performance demands of multimedia applications. This paper presents a color-aware instruction set extension (CAX) for embedded multimedia systems that supports vector processing of color image sequences. CAX supports parallel operations on two-packed 16-bit (6:5:5) YCbCr (luminance-chrominance) data in a 32-bit datapath processor, providing greater concurrency and efficiency for color image and video processing. Unlike typical multimedia extensions (e.g., MMX, VIS, and MDMX), CAX harnesses parallelism within the human perceptual YCbCr space, rather than depending solely on generic subword parallelism. Experimental results on an identically configured, dynamically scheduled 4-way superscalar processor indicate that CAX outperforms MDMX (a representative MIPS multimedia extension) in terms of speedup (3.9× with CAX, but only 2.1× with MDMX over the baseline performance) and energy reduction (68% to 83% reduction with CAX, but only 39% to 69% reduction with MDMX over the baseline). More exhaustive simulations are conducted to provide an in-depth analysis of CAX on machines with varying issue widths, ranging from 1 to 16 instructions per cycle. The impact of the CAX plus loop unrolling is also presented.