Automatic and interactive parallelization
Automatic and interactive parallelization
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
IFM '02 Proceedings of the Third International Conference on Integrated Formal Methods
Optimization of data stream processing
ACM SIGMOD Record
The design and implementation of Grid database services in OGSA-DAI: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
ACM SIGMOD Record
Service-Oriented Architecture: Concepts, Technology, and Design
Service-Oriented Architecture: Concepts, Technology, and Design
Scientific data management in the coming decade
ACM SIGMOD Record
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Scientific workflow management and the Kepler system: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
YAWL: yet another workflow language
Information Systems
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Formal semantics and analysis of control flow in WS-BPEL
Science of Computer Programming
Taverna Workflows: Syntax and Semantics
E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
DFL: A dataflow language based on Petri nets and nested relational calculus
Information Systems
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Lambda Calculus as a Workflow Model
GPC-WORKSHOPS '08 Proceedings of the 2008 The 3rd International Conference on Grid and Pervasive Computing - Workshops
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Maintaining Data Dependencies Across BPEL Process Fragments
ICSOC '07 Proceedings of the 5th international conference on Service-Oriented Computing
A Reflective Framework to Improve the Adaptability of BPEL-based Web Service Composition
SCC '08 Proceedings of the 2008 IEEE International Conference on Services Computing - Volume 1
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
GrayWulf: Scalable Clustered Architecture for Data Intensive Computing
HICSS '09 Proceedings of the 42nd Hawaii International Conference on System Sciences
Workflows and e-Science: An overview of workflow system features and capabilities
Future Generation Computer Systems
Future Generation Computer Systems
A distributed architecture for data mining and integration
Proceedings of the second international workshop on Data-aware distributed computing
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Derivation and Refinement of Textual Syntax for Models
ECMDA-FA '09 Proceedings of the 5th European Conference on Model Driven Architecture - Foundations and Applications
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Analysing scientific workflows with Computational Tree Logic
Cluster Computing
Towards optimising distributed data streaming graphs using parallel streams
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Logical Optimization of Dataflows for Data Mining and Integration Processes
E-SCIENCEW '10 Proceedings of the 2010 Sixth IEEE International Conference on e-Science Workshops
S4: Distributed Stream Computing Platform
ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
A Survey of the State of the Art in Data Mining and Integration Query Languages
NBIS '11 Proceedings of the 2011 14th International Conference on Network-Based Information Systems
MoDELS'05 Proceedings of the 2005 international conference on Satellite Events at the MoDELS
Actor-oriented design of scientific workflows
ER'05 Proceedings of the 24th international conference on Conceptual Modeling
The Data Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business
The Data Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business
Hi-index | 0.00 |
Modern scientific collaborations require large-scale integration of various processes. Higher-level dataflow languages are used on top of parallel and distributed dataflow systems to enable faster data-intensive workflow programs development, their easier optimization, and more maintainable code. In this paper, we present the rationales, design, and application of the needed advanced support for modeling and optimizing data flows for data mining and integration processes. The optimization research and development is based on dataflow pre-execution modeling and extending the registry of process activities by advanced annotations. Additionally, the overall process from a dynamic model to a static model as input for the optimization algorithms is described. This novel approach is implemented within an advanced graphical user interface, called the Process Designer, in order to support semi-automatic optimization as well as within a dataflow execution platform, called the Gateway. It can be adapted to any dataflow language implementation. The Process Designer architecture based on modern (meta-)modeling concepts naturally supports validated transformations between external textual and internal graphical representations of the targeted dataflow language, and in this way significantly increases the productivity and robustness of the implementation processes.