A deterministic annealing approach to clustering
Pattern Recognition Letters
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Parallel Multidimensional Scaling Performance on Multicore Systems
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
MapReduce for Data Intensive Scientific Analyses
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids
IEEE Transactions on Parallel and Distributed Systems
CloudWF: A Computational Workflow System for Clouds Based on Hadoop
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Dimension reduction and visualization of large high-dimensional data via interpolation
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Applying Twister to Scientific Applications
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Apache hadoop goes realtime at Facebook
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
Many distributed computing models have been developed for high performance processing of large scale scientific data. Among them, MapReduce is a popular and widely used fine grain parallel runtime. Workflows integrate and coordinate distributed and heterogeneous components to solve the computation problem which may contain several MapReduce jobs. However, existing workflow solutions have limited supports for important features such as fault tolerance and efficient execution for iterative applications. In this paper, we propose HyMR: a hybrid MapReduce workflow system based on two different MapReduce frameworks. HyMR optimizes scheduling for individual jobs and supports fault tolerance for the entire workflow pipeline. A distributed file system is used for fast data sharing between jobs. We compare a pipeline using HyMR with the workflow model based on a single MapReduce framework. Our results show that the hybrid model achieves a higher efficiency.