Debellor: A Data Mining Platform with Stream Architecture

  • Authors:
  • Marcin Wojnarski

  • Affiliations:
  • Faculty of Mathematics, Informatics and Mechanics, Warsaw University, Warszawa, Poland 02-097

  • Venue:
  • Transactions on Rough Sets IX
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces Debellor (www.debellor.org) --- an open source extensible data mining platform with stream-based architecture, where all data transfers between elementary algorithms take the form of a stream of samples. Data streaming enables implementation of scalable algorithms, which can efficiently process large volumes of data, exceeding available memory. This is very important for data mining research and applications, since the most challenging data mining tasks involve voluminous data, either produced by a data source or generated at some intermediate stage of a complex data processing network. Advantages of data streaming are illustrated by experiments with clustering time series. The experimental results show that even for moderate-size data sets streaming is indispensable for successful execution of algorithms, otherwise the algorithms run hundreds times slower or just crash due to memory shortage. Stream architecture is particularly useful in such application domains as time series analysis, image recognition or mining data streams. It is also the only efficient architecture for implementation of online algorithms. The algorithms currently available on Debellor platform include all classifiers from Rseslib and Weka libraries and all filters from Weka.