The MHETA Execution Model for Heterogeneous Clusters

Authors:
Mario Nakazawa;David K. Lowenthal;Wendou Zhou
Affiliations:
Berea College, Kentucky;University of Georgia;University of Georgia
Venue:
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Year:
2005

Citing 27
Cited 2

A bridging model for parallel computation

Communications of the ACM
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Practical prefetching techniques for multiprocessor file systems

Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
A model and compilation strategy for out-of-core data parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Runtime and language support for compiling adaptive irregular programs on distributed-memory machines

Software—Practice & Experience
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
SMARTS: exploiting temporal locality and parallelism through vertical execution

ICS '99 Proceedings of the 13th international conference on Supercomputing
Selecting tile shape for minimal execution time

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Dynamic data distribution with control flow analysis

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Accurate data redistribution cost estimation in software distributed shared memory systems

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A grid-enabled MPI: message passing in heterogeneous distributed computing systems

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Passion: Optimized I/O for Parallel Applications

Computer
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Improving the Performance of Out-of-Core Computations

ICPP '97 Proceedings of the international Conference on Parallel Processing
Virtual Memory Management in Data Parallel Applications

HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Compiler support for out-of-core arrays on parallel machines

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Panda: fast access to persistent arrays using high-level interfaces and server directed input/output

Panda: fast access to persistent arrays using high-level interfaces and server directed input/output
Automatic parallel code generation for tiled nested loops

Proceedings of the 2004 ACM symposium on Applied computing
Reducing file system latency using a predictive approach

USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Disk-directed I/O for MIMD multiprocessors

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation

Performance prediction with skeletons

Cluster Computing
Adaptive resource remapping through live migration of virtual machines

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The availability of inexpensive "off the shelf" machines increases the likelihood that parallel programs run on heterogeneous clusters of machines. These programs are increasingly likely to be out of core, meaning that portions of their datasets must be stored on disk during program execution. This results in significant, per-iteration, I/O cost.This paper describes an execution model, called MHETA, which is the key component to finding an effective data distribution on heterogeneous clusters. MHETA takes into account computation, communication, and I/O costs of iterative scientific applications. MHETA uses automatically extracted information from a single iteration to predict the execution time of the remaining iterations. Results show that MHETA predicts with on average 98% accuracy the execution time of several scientific benchmarks (with and without prefetching) and one full-scale scientific program that utilize pipelined and other communication. MHETA is thus an effective tool when searching for the most effective distribution on a heterogeneous cluster.