A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Application development on hybrid systems
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Predictive Runtime Code Scheduling for Heterogeneous Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Heterogeneous multicore parallel programming for graphics processing units
Scientific Programming - Software Development for Multi-core Computing Systems
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
State-of-the-art in heterogeneous computing
Scientific Programming
Data-Aware Task Scheduling on Multi-accelerator Based Platforms
ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
Improving programmability of heterogeneous many-core systems via explicit platform descriptions
Proceedings of the 4th International Workshop on Multicore Software Engineering
Explicit Platform Descriptions for Heterogeneous Many-Core Architectures
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Benchmarking modern multiprocessors
Benchmarking modern multiprocessors
Hierarchical place trees: a portable abstraction for task parallelism and data movement
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
High-level support for pipeline parallelism on many-core architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.00 |
Heterogeneous many-core systems constitute a viable approach for coping with power constraints in modern computer architectures and can now be found across the whole computing landscape ranging from mobile devices, to desktop systems and servers, all the way to high-end supercomputers and large-scale data centers. While these systems promise to offer superior performance-power ratios, programming heterogeneous many-core architectures efficiently has been shown to be notoriously difficult. Programmers typically are forced to take into account a plethora of low-level architectural details and usually have to resort to a combination of different programming models within a single application. In this paper we propose a platform description language (PDL) that enables to capture key architectural patterns of commonly used heterogeneous computing systems. PDL architecture descriptions support both programmers and toolchains by providing platform-specific information in a well-defined and explicit manner. We have developed a prototype source-to-source compilation framework that utilizes PDL descriptors to transform sequential task-based programs with source code annotations into a form that is convenient for execution on heterogeneous many-core systems. Our framework relies on a component-based approach that accommodates for different implementation variants of tasks, customized for different parts of a heterogeneous platform, and utilizes an advanced runtime system for exploiting parallelism through dynamic task scheduling. We show various usage scenarios of our PDL and demonstrate the effectiveness of our framework for a commonly used scientific kernel and a financial application on different configurations of a state-of-the-art CPU/GPU system.