A vectorizing Fortran compiler
IBM Journal of Research and Development
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Advanced compiler design and implementation
Advanced compiler design and implementation
A vectorizing compiler for multimedia extensions
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Compiler and microarchitecture mechanisms for exploiting registers to improve memory performance
Compiler and microarchitecture mechanisms for exploiting registers to improve memory performance
Vectorizing for a SIMdD DSP architecture
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A new look at exploiting data parallelism in embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
An innovative low-power high-performance programmable signal processor for digital communications
IBM Journal of Research and Development
Improving superword level parallelism support in modern compilers
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
An integrated simdization framework using virtual vectors
Proceedings of the 19th annual international conference on Supercomputing
VICTORIA: VMX indirect compute technology oriented towards in-line acceleration
Proceedings of the 3rd conference on Computing frontiers
Rotating Register Allocation for Enhanced Pipeline Scheduling
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
SARA: StreAm register allocation
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Hi-index | 0.00 |
The iVMX architecture contains a novel vector register file of up to 4096 vector registers accessed indirectly via a mapping mechanism, providing compatibility with the VMX architecture, and potential for dramatic performance benefits [7]. The large number of vector registers and the unique indirection mechanism pose compilation challenges to be used efficiently: the indirection mechanism emphasizes spatial locality of registers and interaction among destination and source operands during register allocation, and the many vector registers call for aggressive automatic vectorization. This work is a first step in addressing the compilability of iVMX, following the presentation and validation of its architectural aspects [7]. In this paper we present several compilation approaches to deal with the mapping mechanism and an outer-loop vectorization transformation developed to promote the use of many vector registers. We modified an existing register allocator to target all available registers and added a post-pass to rename live-ranges considering spatial locality and interaction among operand types. An FIR filter is used to demonstrate the effectiveness of the techniques developed compared to a version hand-optimized for iVMX. Initial results show that we can reduce the overhead of map management down to 29% of the total instruction count, compared to 22% obtained manually, and compared to 49% obtained using a naive scheme, while outperforming an equivalent VMX implementation by a factor of 2.