Learning decision trees from dynamic data streams

Authors:
João Gama;Pedro Medas;Pedro Rodrigues
Affiliations:
LIACC, FEP, Univ. do Porto, Porto, Portugal;LIACC, Univ. do Porto, Porto, Portugal;LIACC, Univ. do Porto Porto, Portugal
Venue:
Proceedings of the 2005 ACM symposium on Applied computing
Year:
2005

Citing 10
Cited 15

C4.5: programs for machine learning

C4.5: programs for machine learning
Learning in the presence of concept drift and hidden contexts

Machine Learning
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Round robin classification

The Journal of Machine Learning Research
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Forest trees for on-line data

Proceedings of the 2004 ACM symposium on Applied computing
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis

Real-time ranking with concept drift using expert advice

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient instance-based learning on data streams

Intelligent Data Analysis
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

The Journal of Machine Learning Research
Learning Higher Accuracy Decision Trees from Concept Drifting Data Streams

IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
Decision Tree Induction from Numeric Data Stream

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
CBDT: A Concept Based Approach to Data Stream Mining

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Preface: an overview on learning from data streams

New Generation Computing
Measuring evolving data streams' behavior through their intrinsic dimension

New Generation Computing
Ambiguous decision trees for mining concept-drifting data streams

Pattern Recognition Letters
Efficient decision tree construction for mining time-varying data streams

CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
An efficient algorithm for instance-based learning on data streams

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Maintaining optimal multi-way splits for numerical attributes in data streams

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Efficient decision tree re-alignment for clustering time-changing data streams

From active data management to event-based systems and more
Learning with local drift detection

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
RCD: A recurring concept drift framework

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a system for induction of forest of functional trees from data streams able to detect concept drift. The Ultra Fast Forest of Trees (UFFT) is an incremental algorithm, that works online, processing each example in constant time, and performing a single scan over the training examples. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. For multi-class problems the algorithm grows a binary tree for each possible pair of classes, leading to a forest of trees. Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process. Naive-Bayes in leaves are used to classify test examples, naive-Bayes in inner nodes can be used as multivariate splitting-tests if chosen by the splitting criteria, and used to detect drift in the distribution of the examples that traverse the node. When a drift is detected, all the sub-tree rooted at that node will be pruned. The use of naive-Bayes classifiers at leaves to classify test examples, the use of splitting-tests based on the outcome of naive-Bayes, and the use of naive-Bayes classifiers at decision nodes to detect drift are directly obtained from the sufficient statistics required to compute the splitting criteria, without no additional computations. This aspect is a main advantage in the context of high-speed data streams. This methodology was tested with artificial and real-world data sets. The experimental results show a very good performance in comparison to a batch decision tree learner, and high capacity to detect and react to drift.