Towards jungle computing with Ibis/Constellation
Proceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems
DAGuE: A generic distributed DAG engine for High Performance Computing
Parallel Computing
Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
An implementation of the tile QR factorization for a GPU and multiple CPUs
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Optimized composition of performance-aware parallel components
Concurrency and Computation: Practice & Experience
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems
Proceedings of the 26th ACM international conference on Supercomputing
A scalable framework for heterogeneous GPU-based clusters
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Elastic computing: A portable optimization framework for hybrid computers
Parallel Computing
Enabling large-scale scientific workflows on petascale resources using MPI master/worker
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Harmony: collection and analysis of parallel block vectors
Proceedings of the 39th Annual International Symposium on Computer Architecture
Journal of Parallel and Distributed Computing
VForce: An environment for portable applications on high performance systems with accelerators
Journal of Parallel and Distributed Computing
A compiler-assisted runtime-prefetching scheme for heterogeneous platforms
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
ValuePack: value-based scheduling framework for CPU-GPU clusters
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Dataflow-driven GPU performance projection for multi-kernel transformations
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A high-productivity task-based programming model for clusters
Concurrency and Computation: Practice & Experience
High-level support for pipeline parallelism on many-core architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Prius: a runtime for hybrid computing
Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
Exploring heterogeneous scheduling using the task-centric programming model
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Portable performance on heterogeneous architectures
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems
Proceedings of the 27th international ACM conference on International conference on supercomputing
Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms
Proceedings of the ACM International Conference on Computing Frontiers
Programmability and performance portability aspects of heterogeneous multi-/manycore systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Arbiter work stealing for parallelizing games on heterogeneous computing environments
Proceedings of the High Performance Computing Symposium
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Semi-automatic restructuring of offloadable tasks for many-core accelerators
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An (almost) direct deployment of the Fast Multipole Method on the Cell processor
The Journal of Supercomputing
Feature-based analysis of large-scale spatio-temporal sensor data on hybrid architectures
International Journal of High Performance Computing Applications
HARS: A hardware-assisted runtime software for embedded many-core architectures
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Multifrontal QR factorization for multicore architectures over runtime systems
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Model and complexity results for tree traversals on hybrid platforms
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Why it is time for a HyPE: a hybrid query processing engine for efficient GPU coprocessing in DBMS
Proceedings of the VLDB Endowment
Automatic data allocation and buffer management for multi-GPU machines
ACM Transactions on Architecture and Code Optimization (TACO)
Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
GPUfs: Integrating a file system with GPUs
ACM Transactions on Computer Systems (TOCS)
Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Analysis of dependence tracking algorithms for task dataflow execution
ACM Transactions on Architecture and Code Optimization (TACO)
Extending a Run-time Resource Management framework to support OpenCL and Heterogeneous Systems
Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
An application-centric evaluation of OpenCL on multi-core CPUs
Parallel Computing
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method
Proceedings of Workshop on General Purpose Processing Using GPUs
CPU+GPU scheduling with asymptotic profiling
Parallel Computing
PAAS: Power Aware Algorithm for Scheduling in High Performance Computing
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Computers & Mathematics with Applications
Hi-index | 0.00 |
In the field of HPC, the current hardware trend is to design multiprocessor architectures featuring heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE) or data-parallel accelerators (e.g. GPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge. We therefore designed StarPU, an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. The main goal of StarPU is to provide numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand. We have developed several strategies that can be selected seamlessly at run-time, and we have analyzed their efficiency on several algorithms running simultaneously over multiple cores and a GPU. In addition to substantial improvements regarding execution times, we have obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine. We eventually show that our dynamic approach competes with the highly optimized MAGMA library and overcomes the limitations of the corresponding static scheduling in a portable way. Copyright © 2010 John Wiley & Sons, Ltd.