Entropia: architecture and performance of an enterprise desktop grid system
Journal of Parallel and Distributed Computing - Special issue on computational grids
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
BOINC: A System for Public-Resource Computing and Storage
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Measuring and Understanding User Comfort With Resource Borrowing
HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Farsite: federated, available, and reliable storage for an incompletely trusted environment
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Distributed computing in practice: the Condor experience: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
Governor: Autonomic Throttling for Aggressive Idle Resource Scavenging
ICAC '05 Proceedings of the Second International Conference on Automatic Computing
FreeLoader: Scavenging Desktop Storage Resources for Scientific Data
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Glacier: highly durable, decentralized storage despite massive correlated failures
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Efficient replica maintenance for distributed storage systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Replication degree customization for high availability
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
BitDew: a programmable environment for large-scale data management and distribution
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Exploring data reliability tradeoffs in replicated storage systems
Proceedings of the 18th ACM international symposium on High performance distributed computing
On availability of intermediate data in cloud computations
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Exploring MapReduce efficiency with highly-distributed data
Proceedings of the second international workshop on MapReduce and its applications
Adapting MapReduce for HPC environments
Proceedings of the 20th international symposium on High performance distributed computing
MATE-EC2: a middleware for processing data with AWS
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Improving Hadoop performance in intercloud environments
ACM SIGMETRICS Performance Evaluation Review
P2P-MapReduce: Parallel data processing in dynamic Cloud environments
Journal of Computer and System Sciences
Understanding the effects and implications of compute node related failures in hadoop
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
VMR: volunteer MapReduce over the large scale internet
Proceedings of the 10th International Workshop on Middleware for Grids, Clouds and e-Science
Cloud MapReduce for Monte Carlo bootstrap applied to Metabolic Flux Analysis
Future Generation Computer Systems
G-Hadoop: MapReduce across distributed data centers for data-intensive computing
Future Generation Computer Systems
Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
MRBS: towards dependability benchmarking for hadoop mapreduce
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Building and scaling virtual clusters with residual resources from interactive clouds
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
The cloud is not 'there', we are the cloud!
International Journal of Web and Grid Services
Trustworthy distributed computing on social networks
Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Job scheduling for optimizing data locality in Hadoop clusters
Proceedings of the 20th European MPI Users' Group Meeting
A case for MapReduce over the internet
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
CPU+GPU scheduling with asymptotic profiling
Parallel Computing
Hi-index | 0.00 |
MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. However, unlike on dedicated resources, where MapReduce has mostly been deployed, such volunteer computing systems have significantly higher rates of node unavailability. Furthermore, nodes are not fully controlled by the MapReduce framework. Consequently, we found the data and task replication scheme adopted by existing MapReduce implementations woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system, which uses a 60-node cluster where each node possesses a similar hardware configuration to a typical computer in a student lab, demonstrate that MOON can deliver a three-fold performance improvement to Hadoop in volatile, volunteer computing environments.