Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Interprocedural dependence analysis and parallelization
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Loop distribution with arbitrary control flow
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A practical algorithm for exact array dependence analysis
Communications of the ACM
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Digital video processing
ACM Computing Surveys (CSUR)
Advanced compiler design and implementation
Advanced compiler design and implementation
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
VIS Speeds New Media Processing
IEEE Micro
Subword Parallelism with MAX-2
IEEE Micro
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Incorporating Intel® MMX$^{\rm TM}$ technology into a Java$^{\rm TM}$ JIT compiler$^{1}$
Scientific Programming
Vector Pascal an array language for multimedia code
APL '02 Proceedings of the 2002 conference on APL: array processing languages: lore, problems, and applications
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Compiler based exploration of DSP energy savings by SIMD operations
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
An extended ANSI C for processors with a multimedia extension
International Journal of Parallel Programming
An integrated simdization framework using virtual vectors
Proceedings of the 19th annual international conference on Supercomputing
Vectorization past dependent branches through speculation
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
In this paper, we present an implementation of a vectorizing C compiler for Intel's MMX (Multimedia Extension). This compiler would identify data parallel sections of the code using scalar and array dependence analysis. To enhance the scope for application of the subword semantics, our compiler performs several code transformations. These include strip mining, scalar expansion, grouping and reduction, and distribution. Thereafter inline assembly instructions corresponding to the data parallel sections are generated. We have used the Stanford University Intermediate Format (SUIF), a public domain compiler tool, for our implementation. We evaluated the performance of the code generated by our compiler for a number of benchmarks. Initial performance results reveal that our compiler generated code produces a reasonable performance improvement (speedup of 2 to 6.5) over the the code generated without the vectorizing transformations/inline assembly. In certain cases, the performance of the compiler generated code is within 85% of the hand-tuned code for MMX architecture.