Performance Prediction and Calibration for a Class of Multiprocessors
IEEE Transactions on Computers
A dynamic network architecture
ACM Transactions on Computer Systems (TOCS)
A static parameter based performance prediction tool for parallel programs
ICS '93 Proceedings of the 7th international conference on Supercomputing
View-dependent refinement of progressive meshes
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Coyote: a system for constructing fine-grain configurable communication services
ACM Transactions on Computer Systems (TOCS)
Cluster I/O with River: making the fast case common
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Application-level scheduling on distributed heterogeneous networks
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Understanding TCP vegas: a duality model
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
SEDA: an architecture for well-conditioned, scalable internet services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Distributed processing of very large datasets with DataCutter
Parallel Computing - Clusters and computational grids for scientific computing
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Microarchitecture of HaL's CPU
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Optimizing Reduction Computations In a Distributed Environment
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A Run-time System for Efficient Execution of Scientific Workflows on Distributed Environments
SBAC-PAD '06 Proceedings of the 18th International Symposium on Computer Architecture and High Performance Computing
Future Generation Computer Systems
An Efficient and Reliable Scientific Workflow System
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Pathological Image Analysis Using the GPU: Stroma Classification for Neuroblastoma
BIBM '07 Proceedings of the 2007 IEEE International Conference on Bioinformatics and Biomedicine
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Biomedical image analysis on a cooperative cluster of GPUs and multicores
Proceedings of the 22nd annual international conference on Supercomputing
Achieving Multi-Level Parallelism in the Filter-Labeled Stream Programming Model
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A framework for efficient and scalable execution of domain-specific templates on GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Computational Resources in Distributed Heterogeneous Platforms
SBAC-PAD '09 Proceedings of the 2009 21st International Symposium on Computer Architecture and High Performance Computing
Run-time optimizations for replicated dataflows on heterogeneous environments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Simulations of the electrical activity in the heart with graphic processing units
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
IEEE Transactions on Information Technology in Biomedicine
Hi-index | 0.00 |
The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the development of parallel applications for hierarchical systems concentrate on the use of only a single processor type (e.g., accelerators) and do not coordinate several heterogeneous processors. Here, we show that making use of all of the heterogeneous computing resources can significantly improve application performance. Our approach, which consists of optimizing applications at run-time by efficiently coordinating application task execution on all available processing units is evaluated in the context of replicated dataflow applications. The proposed techniques were developed and implemented in an integrated run-time system targeting both intra- and inter-node parallelism. The experimental results with a real-world complex biomedical application show that our approach nearly doubles the performance of the GPU-only implementation on a distributed heterogeneous accelerator cluster.