IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Scheduling Strategies for Master-Slave Tasking on Heterogeneous Processor Platforms
IEEE Transactions on Parallel and Distributed Systems
MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Accelerating computing with the cell broadband engine processor
Proceedings of the 5th conference on Computing frontiers
Predictive Runtime Code Scheduling for Heterogeneous Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Mapping and Synchronizing Streaming Applications on Cell Processors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Extending the OpenMP tasking model to allow dependent tasks
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Dynamic Load Balancing of Matrix-Vector Multiplications on Roadrunner Compute Nodes
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Run-time optimizations for replicated dataflows on heterogeneous environments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Automatic calibration of performance models on heterogeneous multicore architectures
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Multi-GPU and multi-CPU parallelization for interactive physics simulations
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Runtime multitasking support on polymorphic platforms
ACM SIGARCH Computer Architecture News
Cost-aware function migration in heterogeneous systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Improving programmability of heterogeneous many-core systems via explicit platform descriptions
Proceedings of the 4th International Workshop on Multicore Software Engineering
A static task partitioning approach for heterogeneous systems using OpenCL
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures
Proceedings of the 4th Annual International Conference on Systems and Storage
Processing data streams with hard real-time constraints on heterogeneous systems
Proceedings of the international conference on Supercomputing
MDR: performance model driven runtime for heterogeneous parallel platforms
Proceedings of the international conference on Supercomputing
Scaling scientific applications on clusters of hybrid multicore/GPU nodes
Proceedings of the 8th ACM International Conference on Computing Frontiers
HOMPI: a hybrid programming framework for expressing and deploying task-based parallelism
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Petri-nets as an intermediate representation for heterogeneous architectures
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Proceedings of the 14th international conference on Model driven engineering languages and systems
Heterogeneous computing for vertebra detection and segmentation in x-ray images
Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
Seamlessly portable applications: Managing the diversity of modern heterogeneous systems
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
DAGuE: A generic distributed DAG engine for High Performance Computing
Parallel Computing
Enabling task-level scheduling on heterogeneous platforms
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
MDE4HPC: an approach for using model-driven engineering in high-performance computing
SDL'11 Proceedings of the 15th international conference on Integrating System and Software Modeling
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
Proceedings of the 9th conference on Computing Frontiers
Improving performance of adaptive component-based dataflow middleware
Parallel Computing
Workload balancing on heterogeneous systems: a case study of sparse grid interpolation
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Optimizing dataflow applications on heterogeneous environments
Cluster Computing
Scheduling processing of real-time data streams on heterogeneous multi-GPU systems
Proceedings of the 5th Annual International Systems and Storage Conference
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
High-performance general solver for extremely large-scale semidefinite programming problems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A VM-aware fairness scheduler on heterogenous multi-core platforms
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
StarPU-MPI: task programming over clusters of machines enhanced with accelerators
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Correct and efficient work-stealing for weak memory models
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploring heterogeneous scheduling using the task-centric programming model
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
A data-driven approach for executing the CG method on reconfigurable high-performance systems
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
AutoTune: a plugin-driven approach to the automatic tuning of parallel applications
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Fairness scheduler for virtual machines on heterogonous multi-core platforms
ACM SIGAPP Applied Computing Review
An automatic input-sensitive approach for heterogeneous task partitioning
Proceedings of the 27th international ACM conference on International conference on supercomputing
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Proceedings of International Workshop on Adaptive Self-tuning Computing Systems
Hi-index | 0.00 |
In the field of HPC, the current hardware trend is to design multiprocessor architectures that feature heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE SPUs) or data-parallel accelerators (e.g. GPGPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge. We have thus designed StarPU , an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. The main goal of StarPU is to provide numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand. We have developed several strategies that can be selected seamlessly at run time, and we have demonstrated their efficiency by analyzing the impact of those scheduling policies on several classical linear algebra algorithms that take advantage of multiple cores and GPU s at the same time. In addition to substantial improvements regarding execution times, we obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine.