Improving superword level parallelism support in modern compilers

Authors:
Christian Tenllado;Luis Piñuel;Manuel Prieto;Francisco Tirado;F. Catthoor
Affiliations:
Universidad Complutense, Madrid, Spain;Universidad Complutense, Madrid, Spain;Universidad Complutense, Madrid, Spain;Universidad Complutense, Madrid, Spain;Interuniversity MicroElectronic Center (IMEC), Leuven, Belgium
Venue:
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Year:
2005

Citing 7
Cited 4

Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation Techniques for Multimedia Processors

International Journal of Parallel Programming
Internet Streaming SIMD Extensions

Computer
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture

Compiling for an indirect vector register architecture

Proceedings of the 5th conference on Computing frontiers
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A compiler framework for extracting superword level parallelism

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Hybrid type legalization for a sparse SIMD instruction set

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimedia vector instruction sets are becoming ubiquitous in most of the embedded systems used for multimedia, networking and communications. However, current compiler technology do not allow for an efficient exploitation of the inherent data parallelism available in many signal processing and multimedia applications. In this paper, we have explored the automatic vectorization of embedded applications. In particular, we have focused on algorithms in which the same computations are applied over a set of signals that are being processed simultaneously. Usually this set of signals is represented as a 2D array in which each row is an input signal that has to be filtered in some way. A motivating example, inspired by VoIP processing, illustrates that state-of-the-art vectorizing compilers inefficiently exploit the data parallelism inherent to this kind of applications. One of the main reasons behind this, is that they present inner loops that carry all the dependencies and external loops with strided memory accesses.We propose a modification of the Superword Level Parallelism (SLP) compiler, proposed in [9], that tries to overcome these problems. Experimental results show that our approach clearly outperforms commercial compilers.