Logical Optimization of Dataflows for Data Mining and Integration Processes

  • Authors:
  • Alexander Wohrer;Eduard Mehofer;Peter Brezany

  • Affiliations:
  • -;-;-

  • Venue:
  • E-SCIENCEW '10 Proceedings of the 2010 Sixth IEEE International Conference on e-Science Workshops
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern scientific collaborations require large-scale data mining and integration processes. Their investigations involve multi-disciplinary expertise and large-scale computational experiments on top of large amounts of data that are located in distributed data repositories running various software systems, and managed by different organizations. Higher-level dataflow languages are used on top of parallel dataflow systems to enable faster program development and more maintainable code. Logical and physical optimization should be applied prior to its execution to improve performance. In this paper we present the rationale, theory, design and application of logical optimization of data flows for data mining and integration processes. A dataflow model is defined and several optimization algorithms, namely dead elements elimination, process re-ordering, parallelization, and data by-passing are developed. The first research prototype of the framework has been implemented in the context of the ADMIRE Data Mining and Integration Process Designer for logical optimization of specifications expressed in the DISPEL language developed in the ADMIRE project.