The high performance Fortran handbook
The high performance Fortran handbook
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Distributed computing with Triana on the Grid: Research Articles
Concurrency and Computation: Practice & Experience
Specification of grid workflow applications with AGWL: an Abstract Grid Workflow Language
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
ASKALON: A Grid Application Development and Computing Environment
GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Collection-Oriented scientific workflows for integrating and analyzing biological data
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
A novel domain oriented approach for scientific grid workflow composition
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Workflows and e-Science: An overview of workflow system features and capabilities
Future Generation Computer Systems
Efficient provenance storage over nested data collections
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Using Templates to Predict Execution Time of Scientific Workflow Applications in the Grid
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Future Generation Computer Systems
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Predicting the execution time of grid workflow applications through local learning
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Parallelizing XML data-streaming workflows via MapReduce
Journal of Computer and System Sciences
Distributed workflow-driven analysis of large-scale biological data using biokepler
Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Design, verification and prototyping the next generation of desktop grid middleware
GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
Future Generation Computer Systems
Hi-index | 0.00 |
Existing work does not provide a flexible dataset-oriented data flow mechanism to meet the complex requirements of scientific Grid workflow applications. In this paper we present a sophisticated approach to this problem by introducing a data collection concept and the corresponding collection distribution constructs, which are inspired by HPF, however applied to Grid workflow applications. Based on these constructs, more fine-grained data flows can be specified at an abstract workflow language level, such as mapping a portion of a dataset to an activity, independently distributing multiple datasets, not necessarily with the same number of data elements, onto loop iterations. Our approach reduces data duplication, optimizes data transfers as well as simplifies the effort to port workflow applications onto the Grid. We have extended AGWL with these concepts and implemented the corresponding runtime support in ASKALON. We apply our approach to some real world scientific workflow applications and report performance results.