Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Data placement for scientific applications in distributed environments
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Bioinformatics
A MapReduce-Enabled Scientific Workflow Composition Framework
ICWS '09 Proceedings of the 2009 IEEE International Conference on Web Services
Cloud Computing for Satellite Data Processing on High End Compute Clusters
CLOUD '09 Proceedings of the 2009 IEEE International Conference on Cloud Computing
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Lessons learned from a year's worth of benchmarks of large data clouds
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
AzureBlast: a case study of developing science applications on the cloud
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Multiple objective scheduling of HPC workloads through dynamic prioritization
Proceedings of the High Performance Computing Symposium
Hi-index | 0.00 |
MapReduce is promising for developing both scalable business and scientific data intensive applications. However, there are few existing scientific workflow systems which can benefit from the MapReduce programming model. We propose a workflow system for integrating structure, and orchestrating MapReduce jobs for scientific data intensive workflows. The system consists of a simple workflow design C++ API, a job scheduler, and a runtime support system for Hadoop or Sector/Sphere frameworks. A climate satellite data intensive processing and analysis application is developed as a use case and an evaluation for the workflow system. The evaluation shows that it is possible to make the steps in the climate data intensive application automatically from data gridding to complex data analysis using the workflow system. The performance of the climate analysis application is significantly improved by the enabled MapReduce workflow system compared with the sequential embarrassing parallel methods. The overhead of the workflow system is negligible. However, the graphic user interface is still under development for the workflow system.