Utopia: a load sharing facility for large, heterogeneous distributed computer systems
Software—Practice & Experience
Condor-G: A Computation Management Agent for Multi-Institutional Grids
Cluster Computing
Architectural Models for Resource Management in the Grid
GRID '00 Proceedings of the First IEEE/ACM International Workshop on Grid Computing
Job Scheduling Under the Portable Batch System
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Sun Grid Engine: Towards Creating a Compute Power Grid
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
A framework for adaptive execution in grids
Software—Practice & Experience
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
IEEE Internet Computing
An Analysis of Traces from a Production MapReduce Cluster
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A virtual network (ViNe) architecture for grid computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Pilot-MapReduce: an extensible and flexible MapReduce implementation for distributed data
Proceedings of third international workshop on MapReduce and its Applications Date
Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hierarchical MapReduce Programming Model and Scheduling Algorithms
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Investigation of Data Locality in MapReduce
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Network-aware scheduling of mapreduce framework ondistributed clusters over high speed networks
Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Hi-index | 0.00 |
The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive life science applications fit this programming model and benefit from the scalability that can be delivered using this model. One such application is AutoDock, which consists of a suite of automated tools for predicting the bound conformations of flexible ligands to macromolecular targets. However, researchers also need sufficient computation and storage resources to fully enjoy the benefit of MapReduce. For example, a typical AutoDock based virtual screening experiment usually consists of a very large number of docking processes from multiple ligands and is often time consuming to run on a single MapReduce cluster. Although commercial clouds can provide virtually unlimited computation and storage resources on-demand, due to financial, security and possibly other concerns, many researchers still run experiments on a number of small clusters with limited number of nodes that cannot unleash the full power of MapReduce. In this paper, we present a hierarchical MapReduce framework that gathers computation resources from different clusters and run MapReduce jobs across them. The global controller in our framework splits the data set and dispatches them to multiple "local" MapReduce clusters, and balances the workload by assigning tasks in accordance to the capabilities of each cluster and of each node. The local results are then returned back to the global controller for global reduction. Our experimental evaluation using AutoDock over MapReduce shows that our load-balancing algorithm makes promising workload distribution across multiple clusters, and thus minimizes overall execution time span of the entire MapReduce execution.