StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Authors:
Cédric Augonnet;Samuel Thibault;Raymond Namyst;Pierre-André Wacrenier
Affiliations:
University of Bordeaux --- LaBRI --- INRIA Bordeaux Sud-Ouest,;University of Bordeaux --- LaBRI --- INRIA Bordeaux Sud-Ouest,;University of Bordeaux --- LaBRI --- INRIA Bordeaux Sud-Ouest,;University of Bordeaux --- LaBRI --- INRIA Bordeaux Sud-Ouest,
Venue:
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Year:
2009

Citing 9
Cited 43

PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Scheduling Strategies for Master-Slave Tasking on Heterogeneous Processor Platforms

IEEE Transactions on Parallel and Distributed Systems
MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Accelerating computing with the cell broadband engine processor

Proceedings of the 5th conference on Computing frontiers
Predictive Runtime Code Scheduling for Heterogeneous Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Mapping and Synchronizing Streaming Applications on Cell Processors

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Extending the OpenMP tasking model to allow dependent tasks

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism

Dynamic Load Balancing of Matrix-Vector Multiplications on Roadrunner Compute Nodes

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Run-time optimizations for replicated dataflows on heterogeneous environments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Automatic calibration of performance models on heterogeneous multicore architectures

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Multi-GPU and multi-CPU parallelization for interactive physics simulations

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Runtime multitasking support on polymorphic platforms

ACM SIGARCH Computer Architecture News
Cost-aware function migration in heterogeneous systems

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Improving programmability of heterogeneous many-core systems via explicit platform descriptions

Proceedings of the 4th International Workshop on Multicore Software Engineering
A static task partitioning approach for heterogeneous systems using OpenCL

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures

Proceedings of the 4th Annual International Conference on Systems and Storage
Processing data streams with hard real-time constraints on heterogeneous systems

Proceedings of the international conference on Supercomputing
MDR: performance model driven runtime for heterogeneous parallel platforms

Proceedings of the international conference on Supercomputing
Scaling scientific applications on clusters of hybrid multicore/GPU nodes

Proceedings of the 8th ACM International Conference on Computing Frontiers
HOMPI: a hybrid programming framework for expressing and deploying task-based parallelism

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Petri-nets as an intermediate representation for heterogeneous architectures

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Improving scalability and maintenance of software for high-performance scientific computing by combining MDE and frameworks

Proceedings of the 14th international conference on Model driven engineering languages and systems
Heterogeneous computing for vertebra detection and segmentation in x-ray images

Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
Seamlessly portable applications: Managing the diversity of modern heterogeneous systems

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
DAGuE: A generic distributed DAG engine for High Performance Computing

Parallel Computing
Using explicit platform descriptions to support programming of heterogeneous many-core systems

Parallel Computing
Enabling task-level scheduling on heterogeneous platforms

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Interfacing operating systems and polymorphic computing platforms based on the MOLEN programming paradigm

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
MDE4HPC: an approach for using model-driven engineering in high-performance computing

SDL'11 Proceedings of the 15th international conference on Integrating System and Software Modeling
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures

Proceedings of the 9th conference on Computing Frontiers
Improving performance of adaptive component-based dataflow middleware

Parallel Computing
Workload balancing on heterogeneous systems: a case study of sparse grid interpolation

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Optimizing dataflow applications on heterogeneous environments

Cluster Computing
Scheduling processing of real-time data streams on heterogeneous multi-GPU systems

Proceedings of the 5th Annual International Systems and Storage Conference
PaTraCo: a framework enabling the transparent and efficient programming of heterogeneous compute networks

EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
Automatic generation of software pipelines for heterogeneous parallel systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
High-performance general solver for extremely large-scale semidefinite programming problems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A VM-aware fairness scheduler on heterogenous multi-core platforms

Proceedings of the 2012 ACM Research in Applied Computation Symposium
Hierarchical partitioning algorithm for scientific computing on highly heterogeneous CPU + GPU clusters

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
StarPU-MPI: task programming over clusters of machines enhanced with accelerators

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Correct and efficient work-stealing for weak memory models

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploring heterogeneous scheduling using the task-centric programming model

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
A data-driven approach for executing the CG method on reconfigurable high-performance systems

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
AutoTune: a plugin-driven approach to the automatic tuning of parallel applications

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Fairness scheduler for virtual machines on heterogonous multi-core platforms

ACM SIGAPP Applied Computing Review
An automatic input-sensitive approach for heterogeneous task partitioning

Proceedings of the 27th international ACM conference on International conference on supercomputing
ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
A Self-tuning Scientific Framework using Model-Driven Engineering for Heterogeneous Execution Platforms

Proceedings of International Workshop on Adaptive Self-tuning Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the field of HPC, the current hardware trend is to design multiprocessor architectures that feature heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE SPUs) or data-parallel accelerators (e.g. GPGPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge. We have thus designed StarPU , an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. The main goal of StarPU is to provide numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand. We have developed several strategies that can be selected seamlessly at run time, and we have demonstrated their efficiency by analyzing the impact of those scheduling policies on several classical linear algebra algorithms that take advantage of multiple cores and GPU s at the same time. In addition to substantial improvements regarding execution times, we obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine.