Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Dependence flow graphs: an algebraic approach to program dependencies
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient architecture for loop based data preloading
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Simulation/evaluation environment for a VLIW processor architecture
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Advanced compiler design and implementation
Advanced compiler design and implementation
Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology
ICS '99 Proceedings of the 13th international conference on Supercomputing
Exploiting a new level of DLP in multimedia applications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Optimizing inter-nest data locality
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Compilation Techniques for Multimedia Processors
International Journal of Parallel Programming
Cross-Loop Reuse Analysis and Its Application to Cache Optimizations
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Optimizing Software Data Prefetches with Rotating Registers
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Torrent Architecture Manual
Compiler and microarchitecture mechanisms for exploiting registers to improve memory performance
Compiler and microarchitecture mechanisms for exploiting registers to improve memory performance
An innovative low-power high-performance programmable signal processor for digital communications
IBM Journal of Research and Development
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
Proceedings of the international symposium on Code generation and optimization
Generation of permutations for SIMD processors
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An integrated simdization framework using virtual vectors
Proceedings of the 19th annual international conference on Supercomputing
Exploiting Vector Parallelism in Software Pipelined Loops
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multi-platform Auto-vectorization
Proceedings of the International Symposium on Code Generation and Optimization
VICTORIA: VMX indirect compute technology oriented towards in-line acceleration
Proceedings of the 3rd conference on Computing frontiers
Optimizing data permutations for SIMD devices
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Auto-vectorization of interleaved data for SIMD
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Register pointer architecture for efficient embedded processors
Proceedings of the conference on Design, automation and test in Europe
Compiling for an indirect vector register architecture
Proceedings of the 5th conference on Computing frontiers
Versatility of extended subwords and the matrix register file
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit
Journal of Signal Processing Systems
Automatic parallelization for graphics processing units
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Access-pattern-aware on-chip memory allocation for SIMD processors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
New algorithms for SIMD alignment
CC'07 Proceedings of the 16th international conference on Compiler construction
Symbolic crosschecking of floating-point and SIMD code
Proceedings of the sixth conference on Computer systems
Efficient SIMD code generation for irregular kernels
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Extending OpenMP* with vector constructs for modern multicore SIMD architectures
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Parallel execution of Java loops on Graphics Processing Units
Science of Computer Programming
Hi-index | 0.00 |
The Single Instruction Multiple Data (SIMD) model for finegrained parallelism was recently extended to support SIMD operations on disjoint vector elements. In this paper we demonstrate how SIMdD (SIMD on disjoint data) supports e#ective vectorization of digital signal processing (DSP) benchmarks, by facilitating data reorganization and reuse. In particular we show that this model can be adopted by a compiler to achieve nearoptimal performance for important classes of kernels.