Regression on evolving multi-relational data streams

Authors:
Elena Ikonomovska;Sašo Džeroski
Affiliations:
Institute Jožef Stefan, Jamova cesta, Ljubljana, Slovenia;Institute Jožef Stefan, Jamova cesta, Ljubljana, Slovenia
Venue:
Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Year:
2011

Citing 26
Cited 0

Top-down induction of first-order logical decision trees

Artificial Intelligence
Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Scaling Up Inductive Logic Programming by Learning from Interpretations

Data Mining and Knowledge Discovery
Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Top-Down Induction of Clustering Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Join-distinct aggregate estimation over update streams

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Graph-based synopses for relational selectivity estimation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Classification spanning correlated data streams

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Estimating entropy over data streams

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Approximate continuous querying over distributed streams

ACM Transactions on Database Systems (TODS)
Multi-query optimization for sketch-based estimation

Information Systems
Multirelational classification: a multiple view approach

Knowledge and Information Systems
Temporal-Relational Classifiers for Prediction in Evolving Domains

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Combining Multiple Interrelated Streams for Incremental Clustering

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
RedTrees: A relational decision tree algorithm in streams

Expert Systems with Applications: An International Journal
Streaming multiple aggregations using phantoms

The VLDB Journal — The International Journal on Very Large Data Bases
Tree induction over perennial objects

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Towards clausal discovery for stream mining

ILP'09 Proceedings of the 19th international conference on Inductive logic programming
Learning model trees from evolving data streams

Data Mining and Knowledge Discovery
CrossMine: efficient classification across multiple database relations

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last decade, researchers have recognized the need of an increased attention to a type of knowledge discovery applications where the data analyzed is not finite, but streams into the system continuously and endlessly. Data streams are ubiquitous, entering almost every area of modern life. As a result, processing, managing and learning from multiple data streams have become important and challenging tasks for the data mining, database and machine learning communities. Although a substantial body of algorithms for processing and learning from data streams has been developed, most of the work is focused on one-dimensional numerical data streams (time series) or a single multi-dimensional data stream. Only few of the existing solutions consider the most realistic scenario where data can be incomplete, correlated with other streams of information and can arrive from multiple heterogeneous sources. This paper discusses the requirements and the difficulties for learning from multiple multi-dimensional data streams inter-linked according to a pre-defined semantic schema (multi-relational data streams). The main research problem is to develop a time-efficient, resource-aware methodology for linking and exploring the information which is arriving independently and in an asynchronous way from its respective sources. The resulting framework has to enable, at any time error-bounded approximate answers to aggregate queries commonly issued in the process of multi-relational data mining. In particular we focus on the task of learning regression trees and their variants (model trees, option trees, multi-target trees) from multiple correlated streaming sources. To the best of our knowledge, no other work has previously addressed the problem of learning regression trees from multi-relational data streams.