A storage-efficient WY representation for products of householder transformations
SIAM Journal on Scientific and Statistical Computing
The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Data flow computing: theory and practice
Data flow computing: theory and practice
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Matrix algorithms
Operating Systems Theory
PARA '95 Proceedings of the Second International Workshop on Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science
Compact DAG representation and its symbolic scheduling
Journal of Parallel and Distributed Computing
Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed
International Journal of High Performance Computing Applications
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Multi-threading and one-sided communication in parallel LU factorization
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Updating an LU Factorization with Pivoting
ACM Transactions on Mathematical Software (TOMS)
Parallel tiled QR factorization for multicore architectures
Concurrency and Computation: Practice & Experience
Distributed SBP Cholesky factorization algorithms with near-optimal scheduling
ACM Transactions on Mathematical Software (TOMS)
Workflow Global Computing with YML
GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The impact of multicore on math software
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Minimal data copy for dense linear algebra factorization
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
Enabling large-scale scientific workflows on petascale resources using MPI master/worker
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
From serial loops to parallel execution on distributed systems
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Parallelizing the execution of sequential scripts
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Feature-based analysis of large-scale spatio-temporal sensor data on hybrid architectures
International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications
Multifrontal QR factorization for multicore architectures over runtime systems
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Computers & Mathematics with Applications
Hi-index | 0.00 |
The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures is a tremendous task for the whole scientific computing community. We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of the framework, and a linear algebra factorization as a use case.