Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
A text-compression-based method for code size minimization in embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Bidwidth analysis with application to silicon compilation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation techniques for multimedia processors
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
A vectorizing compiler for multimedia extensions
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Measuring the Performance of Multimedia Instruction Sets
IEEE Transactions on Computers
Transformatiing and Parallelizing ANSI C Programs using Pattern Recognition
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Compiling for SIMD Within a Register
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Designing the Agassiz Compiler for Concurrent Multithreaded Architectures
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Design and characterization of the Berkeley multimedia workload
Multimedia Systems
Automatic detection of saturation and clipping idioms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Optimizing compiler for shared-memory multiple SIMD architecture
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Optimizing techniques for saturated arithmetic with first-order linear recurrence
Proceedings of the 2009 ACM symposium on Applied Computing
Data pipeline optimization for shared memory multiple-SIMD architecture
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Hi-index | 0.00 |
Modern processors' multimedia extensions (MME) provide SIMD ISAs to boost the performance of typical operations in multimedia applications. However, automatic vectorization support for them is not very mature. The key difficulty is how to vectorize those SIMD-ISA-supported idioms in source code in an efficient and general way. In this paper, we introduce a powerful and ex-tendable recognition engine to solve this problem, which only needs a small amount of rules to recognize many such idioms and generate efficient SIMD in-structions. We integrated this engine into the classic vectorization framework and obtained very good performance speedup for some real-life applications.