Scientific workflow management and the Kepler system: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Advanced data flow support for scientific grid workflow applications
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
MapReduce for Data Intensive Scientific Analyses
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Bioinformatics
A MapReduce-Enabled Scientific Workflow Composition Framework
ICWS '09 Proceedings of the 2009 IEEE International Conference on Web Services
SERVICES '09 Proceedings of the 2009 Congress on Services - I
Automated component-level evaluation: present and future
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Workflows for metabolic flux analysis: data integration and human interaction
ISoLA'10 Proceedings of the 4th international conference on Leveraging applications of formal methods, verification, and validation - Volume Part I
A MapReduce workflow system for architecting scientific data intensive applications
Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing
Nova: continuous Pig/Hadoop workflows
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
SciHadoop: array-based query processing in Hadoop
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Prediction-based auto-scaling of scientific workflows
Proceedings of the 9th International Workshop on Middleware for Grids, Clouds and e-Science
Distributed workflow-driven analysis of large-scale biological data using biokepler
Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
Provenance for MapReduce-based data-intensive workflows
Proceedings of the 6th workshop on Workflows in support of large-scale science
ModeleR: An enviromental model repository as knowledge base for experts
Expert Systems with Applications: An International Journal
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Proceedings of the 2012 Joint EDBT/ICDT Workshops
ProvManager: a provenance management system for scientific workflows
Concurrency and Computation: Practice & Experience
A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds
Journal of Grid Computing
Future Generation Computer Systems
Oozie: towards a scalable workflow management system for Hadoop
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
A continuous workflow scheduling framework
Proceedings of the 2nd ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
SIDR: structure-aware intelligent data routing in Hadoop
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Approaches to Distributed Execution of Scientific Workflows in Kepler
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Hi-index | 0.00 |
MapReduce provides a parallel and scalable programming model for data-intensive business and scientific applications. MapReduce and its de facto open source project, called Hadoop, support parallel processing on large datasets with capabilities including automatic data partitioning and distribution, load balancing, and fault tolerance management. Meanwhile, scientific workflow management systems, e.g., Kepler, Taverna, Triana, and Pegasus, have demonstrated their ability to help domain scientists solve scientific problems by synthesizing different data and computing resources. By integrating Hadoop with Kepler, we provide an easy-to-use architecture that facilitates users to compose and execute MapReduce applications in Kepler scientific workflows. Our implementation demonstrates that many characteristics of scientific workflow management systems, e.g., graphical user interface and component reuse and sharing, are very complementary to those of MapReduce. Using the presented Hadoop components in Kepler, scientists can easily utilize MapReduce in their domain-specific problems and connect them with other tasks in a workflow through the Kepler graphical user interface. We validate the feasibility of our approach via a word count use case.