Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
The structure of parafrase-2: an advanced parallelizing compiler for C and FORTRAN
Selected papers of the second workshop on Languages and compilers for parallel computing
Simple vector microprocessors for multimedia applications
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Automatic loop transformations and parallelization for Java
Proceedings of the 14th international conference on Supercomputing
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation techniques for multimedia processors
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
A vectorizing compiler for multimedia extensions
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
Adaptive optimization in the Jalapeño JVM
OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
Automatic Parallelization for Non-cache Coherent Multiprocessors
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Polaris: Improving the Effectiveness of Parallelizing Compilers
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
JavaSpMT: A Speculative Thread Pipelining Parallelization Model for Java Programs
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Vectorizing for a SIMdD DSP architecture
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
IBM Systems Journal
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
SableSpMT: a software framework for analysing speculative multithreading in Java
PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Loop Parallelisation for the Jikes RVM
PDCAT '05 Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
General-purpose GPU computing: practice and experience
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Introducing Control Flow into Vectorized Code
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Automatically translating a general purpose C++ image processing library for GPUs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimizing chip multiprocessor work distribution using dynamic compilation
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Multi-GPU and multi-CPU parallelization for interactive physics simulations
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
The architecture of the DecentVM: towards a decentralized virtual machine for many-core computing
Virtual Machines and Intermediate Languages
Mathematical morphology in computer graphics, scientific visualization and visual exploration
ISMM'11 Proceedings of the 10th international conference on Mathematical morphology and its applications to image and signal processing
Paragon: collaborative speculative loop execution on GPU and CPU
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Adaptive input-aware compilation for graphics engines
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
KFusion: optimizing data flow without compromising modularity
Proceedings of the 12th annual international conference on Aspect-oriented software development
Parallel execution of Java loops on Graphics Processing Units
Science of Computer Programming
Leveraging GPUs using cooperative loop speculation
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Accelerated graphics cards, or Graphics Processing Units (GPUs), have become ubiquitous in recent years. On the right kinds of problems, GPUs greatly surpass CPUs in terms of raw performance. However, because they are difficult to program, GPUs are used only for a narrow class of special-purpose applications; the raw processing power made available by GPUs is unused most of the time. This paper presents an extension to a Java JIT compiler that executes suitable code on the GPU instead of the CPU. Both static and dynamic features are used to decide whether it is feasible and beneficial to off-load a piece of code on the GPU. The paper presents a cost model that balances the speedup available from the GPU against the cost of transferring input and output data between main memory and GPU memory. The cost model is parameterized so that it can be applied to different hardware combinations. The paper also presents ways to overcome several obstacles to parallelization inherent in the design of the Java bytecode language: unstructured control flow, the lack of multi-dimensional arrays, the precise exception semantics, and the proliferation of indirect references.