Automatic loop transformations and parallelization for Java
Proceedings of the 14th international conference on Supercomputing
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Auto-vectorization of interleaved data for SIMD
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Automatic vectorization using dynamic compilation and tree pattern matching technique in Jikes RVM
Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems
Vapor SIMD: Auto-vectorize once, run everywhere
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Vectorization technology to improve interpreter performance
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Java is one of the most popular programming languages in today's software development, but the adoption of Java in some areas like high performance computing, gaming, and media processing is not as universal as in general-purpose computing. A major drawback preventing it from being extensively adopted in those areas is its lower performance than the traditional or domain-specific languages. This paper describes two approaches to improve Java's usability in those areas by introducing vector processing capability to Java. The first approach is to provide a Java vectorization interface (JVI) that developers can program with, to explicitly expose the programs' data parallelism. The other approach is to use automatic vectorization to generate vector instructions for Java programs. It does not require programmers to modify the original source code. We evaluate the two vectorization approaches with SPECjvm2008 benchmark. The performances of scimark.fft and scimark.lu are improved up to 55% and 107% respectively when running in single thread. We also investigate some factors that impact the vectorization effects, including the memory bus bandwidth and the superscalar micro-architecture.