Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Advanced compiler design and implementation
Advanced compiler design and implementation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An integrated simdization framework using virtual vectors
Proceedings of the 19th annual international conference on Supercomputing
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Auto-vectorization of interleaved data for SIMD
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Compiler optimizations for processors with SIMD instructions
Software—Practice & Experience
The development of the data-parallel GPU programming language CGiS
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
The CGiS compiler—a tool demonstration
CC'06 Proceedings of the 15th international conference on Compiler Construction
Hi-index | 0.00 |
Today's desktop PCs feature a variety of parallel processing units. Developing applications that exploit this parallelism is a demanding task, and a programmer has to obtain detailed knowledge about the hardware for efficient implementation. CGiSis a data-parallel programming language providing a unified abstraction for two parallel processing units: graphics processing units (GPUs) and the vector processing units of CPUs. The CGiScompiler framework fully virtualizes the differences in capability and accessibility by mapping an abstract data-parallel programming model on those targets. The applicability of CGiSfor GPUs has been shown in previous work; this work presents the extension of the framework for SIMD instruction sets of CPUs. We show how to overcome the obstacles in mapping the abstract programming model of CGiSto the SIMD hardware. Our experimental results underline the viability of this approach: Real-world applications can be implemented easily with CGiSand result in efficient code.