A bridging model for parallel computation
Communications of the ACM
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Practical prefetching techniques for multiprocessor file systems
Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
A model and compilation strategy for out-of-core data parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Software—Practice & Experience
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Automatic compiler-inserted I/O prefetching for out-of-core applications
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
SMARTS: exploiting temporal locality and parallelism through vertical execution
ICS '99 Proceedings of the 13th international conference on Supercomputing
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Dynamic data distribution with control flow analysis
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Accurate data redistribution cost estimation in software distributed shared memory systems
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A grid-enabled MPI: message passing in heterogeneous distributed computing systems
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Improving the Performance of Out-of-Core Computations
ICPP '97 Proceedings of the international Conference on Parallel Processing
Virtual Memory Management in Data Parallel Applications
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Compiler support for out-of-core arrays on parallel machines
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters
CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Panda: fast access to persistent arrays using high-level interfaces and server directed input/output
Panda: fast access to persistent arrays using high-level interfaces and server directed input/output
Automatic parallel code generation for tiled nested loops
Proceedings of the 2004 ACM symposium on Applied computing
Reducing file system latency using a predictive approach
USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Disk-directed I/O for MIMD multiprocessors
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Performance prediction with skeletons
Cluster Computing
Adaptive resource remapping through live migration of virtual machines
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Hi-index | 0.00 |
The availability of inexpensive "off the shelf" machines increases the likelihood that parallel programs run on heterogeneous clusters of machines. These programs are increasingly likely to be out of core, meaning that portions of their datasets must be stored on disk during program execution. This results in significant, per-iteration, I/O cost.This paper describes an execution model, called MHETA, which is the key component to finding an effective data distribution on heterogeneous clusters. MHETA takes into account computation, communication, and I/O costs of iterative scientific applications. MHETA uses automatically extracted information from a single iteration to predict the execution time of the remaining iterations. Results show that MHETA predicts with on average 98% accuracy the execution time of several scientific benchmarks (with and without prefetching) and one full-scale scientific program that utilize pipelined and other communication. MHETA is thus an effective tool when searching for the most effective distribution on a heterogeneous cluster.