An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions

Authors:
Gang Ren;Peng Wu;David Padua
Affiliations:
University of Illinois at Urbana-Champaign;IBM T.J. Watson Research Center, Yorktown Heights, NY;University of Illinois at Urbana-Champaign
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Year:
2005

Citing 20
Cited 14

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Parallel Prefix Computation

Journal of the ACM (JACM)
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation techniques for multimedia processors

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
A vectorizing compiler for multimedia extensions

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
Monotonic evolution: an alternative to induction variable substitution for dependence analysis

ICS '01 Proceedings of the 15th international conference on Supercomputing
Automatic intra-register vectorization for the Intel architecture

International Journal of Parallel Programming
Array recovery and high-level transformations for DSP applications

ACM Transactions on Embedded Computing Systems (TECS)
A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Compiling for SIMD Within a Register

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Increasing and Detecting Memory Address Congruence

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Design and characterization of the Berkeley multimedia workload

Multimedia Systems
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance

Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Superword-Level Parallelism in the Presence of Control Flow

Proceedings of the international symposium on Code generation and optimization
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion

Proceedings of the international symposium on Code generation and optimization
Formal loop merging for signal transforms

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Paper: A comparative study of automatic vectorizing compilers

Parallel Computing
Automatic detection of saturation and clipping idioms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion

Proceedings of the international symposium on Code generation and optimization
Optimizing data permutations for SIMD devices

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Vector LLVA: a virtual vector instruction set for media processing

Proceedings of the 2nd international conference on Virtual execution environments
Challenges in exploitation of loop parallelism in embedded applications

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Exploiting SIMD Parallelism with the CGiS Compiler Framework

Languages and Compilers for Parallel Computing
On the exploitation of loop-level parallelism in embedded applications

ACM Transactions on Embedded Computing Systems (TECS)
Optimizing techniques for saturated arithmetic with first-order linear recurrence

Proceedings of the 2009 ACM symposium on Applied Computing
Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit

Journal of Signal Processing Systems
Mapping streaming languages to general purpose processors through vectorization

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Portable Parallel Programs using architecture-aware libraries

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Compiler supports for VLIW DSP processors with SIMD intrinsics

Concurrency and Computation: Practice & Experience
Instruction selection for subword level parallelism optimizations for application specific instruction processors

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimedia extensions (MME) are architectural extensions to general-purpose processors to boost the performance of multimedia workloads. Today, in-line assembly code, intrinsic functions and library routines are the most common means to program these extensions. A promising alternative is to exploit vectorization technology to automatically generate MME instructions from programs written in standard high-level languages. However, despite the early success of automatic vectorization for traditional vector supercomputers, state-of-the-art vectorizing compilers for multimedia extensions have yet to demonstrate their effectiveness, especially on multimedia workloads. In this paper, we conducted an empirical study on the vectorization of media processing programs for multimedia extensions. Our study identified several new issues that are not handled by traditional vectorizers. These issues arise partly as the result of the unique features of MME architectures, partly due to the characteristics of media processing applications. We proposed several techniques to address some of these issues. We further assessed the effectiveness of our techniques by manually applying them to a set of multimedia programs. In addition, we found that further optimizations after vectorization are essential to benefit from multimedia extensions. In our experiments, 23 of 34 core procedures from the Berkeley Media Benchmark (BMW) were manually vectorized and 14 procedures achieved speedups of 1.10 to 3.39 on a Pentium 4 processor.