Regression on evolving multi-relational data streams

  • Authors:
  • Elena Ikonomovska;Sašo Džeroski

  • Affiliations:
  • Institute Jožef Stefan, Jamova cesta, Ljubljana, Slovenia;Institute Jožef Stefan, Jamova cesta, Ljubljana, Slovenia

  • Venue:
  • Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last decade, researchers have recognized the need of an increased attention to a type of knowledge discovery applications where the data analyzed is not finite, but streams into the system continuously and endlessly. Data streams are ubiquitous, entering almost every area of modern life. As a result, processing, managing and learning from multiple data streams have become important and challenging tasks for the data mining, database and machine learning communities. Although a substantial body of algorithms for processing and learning from data streams has been developed, most of the work is focused on one-dimensional numerical data streams (time series) or a single multi-dimensional data stream. Only few of the existing solutions consider the most realistic scenario where data can be incomplete, correlated with other streams of information and can arrive from multiple heterogeneous sources. This paper discusses the requirements and the difficulties for learning from multiple multi-dimensional data streams inter-linked according to a pre-defined semantic schema (multi-relational data streams). The main research problem is to develop a time-efficient, resource-aware methodology for linking and exploring the information which is arriving independently and in an asynchronous way from its respective sources. The resulting framework has to enable, at any time error-bounded approximate answers to aggregate queries commonly issued in the process of multi-relational data mining. In particular we focus on the task of learning regression trees and their variants (model trees, option trees, multi-target trees) from multiple correlated streaming sources. To the best of our knowledge, no other work has previously addressed the problem of learning regression trees from multi-relational data streams.