Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Efficient and precise array access analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Run-time Assisted Interprocedural Analysis of Memory Access Patterns
Run-time Assisted Interprocedural Analysis of Memory Access Patterns
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Hybrid analysis: static & dynamic memory reference analysis
International Journal of Parallel Programming
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Memory Coloring: A Compiler Approach for Scratchpad Memory Management
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
PyPy's approach to virtual machine construction
Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
RPython: a step towards reconciling dynamically and statically typed OO languages
Proceedings of the 2007 symposium on Dynamic languages
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Beginning DirectX 10 Game Programming
Beginning DirectX 10 Game Programming
Programming model for a heterogeneous x86 platform
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Software Pipelined Execution of Stream Programs on GPUs
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Tracing the meta-level: PyPy's tracing JIT compiler
Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems
Towards data tiling for whole programs in scratchpad memory allocation
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Copperhead: compiling an embedded data parallel language
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Adaptive input-aware compilation for graphics engines
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Hi-index | 0.00 |
A new compilation framework enables the execution of numerical-intensive applications, written in Python, on a hybrid execution environment formed by a CPU and a GPU. This compiler automatically computes the set of memory locations that need to be transferred to the GPU, and produces the correct mapping between the CPU and the GPU address spaces. Thus, the programming model implements a virtual shared address space. This framework is implemented as a combination of unPython, an ahead-of-time compiler from Python/NumPy to the C programming language, and jit4GPU, a just-in-time compiler from C to the AMD CAL interface. Experimental evaluation demonstrates that for some benchmarks the generated GPU code is 50 times faster than generated OpenMP code. The GPU performance also compares favorably with optimized CPU BLAS code for single-precision computations in most cases.