Vectorizing for a SIMdD DSP architecture

Authors:
Dorit Naishlos;Marina Biberstein;Shay Ben-David;Ayal Zaks
Affiliations:
Haifa University Campus, Haifa, Israel;Haifa University Campus, Haifa, Israel;Haifa University Campus, Haifa, Israel;Haifa University Campus, Haifa, Israel
Venue:
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Year:
2003

Citing 19
Cited 23

Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Dependence flow graphs: an algebraic approach to program dependencies

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient architecture for loop based data preloading

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Simulation/evaluation environment for a VLIW processor architecture

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Advanced compiler design and implementation

Advanced compiler design and implementation
Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology

ICS '99 Proceedings of the 13th international conference on Supercomputing
Exploiting a new level of DLP in multimedia applications

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Optimizing inter-nest data locality

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Compilation Techniques for Multimedia Processors

International Journal of Parallel Programming
MMX Technology Extension to the Intel Architecture

IEEE Micro
AltiVec Extension to PowerPC Accelerates Media Processing

IEEE Micro
Cross-Loop Reuse Analysis and Its Application to Cache Optimizations

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Optimizing Software Data Prefetches with Rotating Registers

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Torrent Architecture Manual

Torrent Architecture Manual
Compiler and microarchitecture mechanisms for exploiting registers to improve memory performance

Compiler and microarchitecture mechanisms for exploiting registers to improve memory performance
An innovative low-power high-performance programmable signal processor for digital communications

IBM Journal of Research and Development

Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion

Proceedings of the international symposium on Code generation and optimization
Generation of permutations for SIMD processors

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An integrated simdization framework using virtual vectors

Proceedings of the 19th annual international conference on Supercomputing
Exploiting Vector Parallelism in Software Pipelined Loops

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multi-platform Auto-vectorization

Proceedings of the International Symposium on Code Generation and Optimization
VICTORIA: VMX indirect compute technology oriented towards in-line acceleration

Proceedings of the 3rd conference on Computing frontiers
Optimizing data permutations for SIMD devices

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
Register pointer architecture for efficient embedded processors

Proceedings of the conference on Design, automation and test in Europe
Compiling for an indirect vector register architecture

Proceedings of the 5th conference on Computing frontiers
Versatility of extended subwords and the matrix register file

ACM Transactions on Architecture and Code Optimization (TACO)
Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit

Journal of Signal Processing Systems
Automatic parallelization for graphics processing units

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Access-pattern-aware on-chip memory allocation for SIMD processors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
New algorithms for SIMD alignment

CC'07 Proceedings of the 16th international conference on Compiler construction
Symbolic crosschecking of floating-point and SIMD code

Proceedings of the sixth conference on Computer systems
Efficient SIMD code generation for irregular kernels

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Extending OpenMP* with vector constructs for modern multicore SIMD architectures

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Parallel execution of Java loops on Graphics Processing Units

Science of Computer Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Single Instruction Multiple Data (SIMD) model for finegrained parallelism was recently extended to support SIMD operations on disjoint vector elements. In this paper we demonstrate how SIMdD (SIMD on disjoint data) supports e#ective vectorization of digital signal processing (DSP) benchmarks, by facilitating data reorganization and reuse. In particular we show that this model can be adopted by a compiler to achieve nearoptimal performance for important classes of kernels.