Service Oriented KDD: A Framework for Grid Data Mining Workflows

  • Authors:
  • Marco Lackovic;Domenico Talia;Paolo Trunfio

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Weka4WS is an extension of the Weka toolkit to support remote execution of data mining tasks as Grid services. A first version of Weka4WS supporting concurrent execution of multiple data mining tasks on remote Grid nodes has been presented in a previous work. In this paper we present a new version supporting also the composition and execution of data mining workflows on a Grid. This new version of Weka4WS extends the KnowledgeFlow component of Weka by allowing the data mining tasks of the workflow to run in parallel on different machines, hence reducing the execution time. Besides the performance improvement, the capability of designing data mining applications as workflows allows to define typical patterns and to reuse them in different contexts. In this paper we describe the architecture of the system, the functionalities of the Weka4WS KnowledgeFlow, and some examples of use with their performance.