Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
GPGPU: general purpose computation on graphics hardware
ACM SIGGRAPH 2004 Course Notes
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Programming model for a heterogeneous x86 platform
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Parallel search on video cards
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
PACUE: processor allocator considering user experience
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Hi-index | 0.00 |
Computer systems are moving towards a heterogeneous architecture with a combination of one or more CPUs and one or more accelerator processors. Such heterogeneous systems pose a new challenge to the parallel programming community. Languages such as OpenCL and CUDA provide a program environment for such systems. However, they focus on data parallel programming where the majority of computation is carried out by the accelerators. Our view is that, in the future, accelerator processors will be tightly coupled with the CPUs, be available in different system architectures (e.g., integrated and discrete), and systems will be dynamically reconfigurable. In this paper we advocate a balanced programming model where computation is balanced between the CPU and its accelerators. This model supports sharing virtual memory between the CPU and the accelerator processors so the same data structures can be manipulated by both sides. It also supports task-parallel as well as data-parallel programming, fine-grained synchronization, thread scheduling, and load balancing. This model not only leverages the computational capability of CPUs, but also allows dynamic system reconfiguration, and supports different platform configurations. To help demonstrate the practicality of our programming model, we present performance results for a preliminary implementation on a computer system with an Intel® Core™ i7 processor and a discrete Larrabee processor. These results show that the model's most performance-critical part, its shared virtual memory implementation, simplifies programming without hurting performance.