Code generation using tree matching and dynamic programming
ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Generation of permutations for SIMD processors
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Hi-index | 0.00 |
SIMD instructions are often implemented in modern multimedia oriented processors. Although SIMD instructions are useful for many digital signal processing applications, most compilers do not exploit SIMD instructions. The difficulty in the utilization of SIMD instructions stems from data parallelism in registers. In assembly code generation, the positions of data in registers must be noted. A technique of generating pack instructions which pack or reorder data in registers is essential for exploitation of SIMD instructions. This paper presents a code generation technique for SIMD instructions with pack instructions. SIMD instructions are generated by finding and grouping the same operations in programs. After the SIMD instruction generation, pack instructions are generated. In the pack instruction generation, Multi-valued Decision Diagram (MDD) is introduced to represent and to manipulate sets of packed data. Experimental results show that our code generation technique can generate assembly code with SIMD and pack instructions performing complex repacking of 8 packed data in registers for a commercial VLIW processor with 6 pack instructions and achieved speedup ratio of up to 7.7.