Loop Parallelisation for the Jikes RVM

Authors:
Jisheng Zhao;Ian Rogers;Chris Kirkham;Ian Watson
Affiliations:
University of Manchester, UK;University of Manchester, UK;University of Manchester, UK;University of Manchester, UK
Venue:
PDCAT '05 Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies
Year:
2005

Citing 0
Cited 6

Speculative improvements to verifiable bounds check elimination

Proceedings of the 6th international symposium on Principles and practice of programming in Java
Automatic vectorization using dynamic compilation and tree pattern matching technique in Jikes RVM

Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems
Automatic parallelization for graphics processing units

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
A Verifiable, Control Flow Aware Constraint Analyzer for Bounds Check Elimination

SAS '09 Proceedings of the 16th International Symposium on Static Analysis
A cost-aware parallel workload allocation approach based on machine learning techniques

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Parallel execution of Java loops on Graphics Processing Units

Science of Computer Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Increasing the number of instructions executing in parallel has helped improve processor performance, but the technique is limited. Executing code on parallel threads and processors has fewer limitations, but most computer programs tend to be serial in nature. This paper presents a compiler optimisation that at run-time parallelises code inside a JVM and thereby increases the number of threads. We show Spec JVM benchmark results for this optimisation. The performance on a current desktop processor is slower than without parallel threads, caused by thread creation costs, but with these costs removed the performance is better than the serial code. We measure the threading costs and discuss how a future computer architecture will enable this optimisation to be feasible in exploiting thread instead of instruction and/or vector parallelism.