Decision Tree Evolution Using Limited Number of Labeled Data Items from Drifting Data Streams

Authors:
Wei Fan;Yi-an Huang;Philip S. Yu
Affiliations:
IBM T. J. Watson Research, Hawthorne, NY;Georgia Institute of Technology, Atlanta, GA;IBM T. J. Watson Research, Hawthorne, NY
Venue:
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Year:
2004

Citing 0
Cited 7

An active learning system for mining time-changing data streams

Intelligent Data Analysis
Ambiguous decision trees for mining concept-drifting data streams

Pattern Recognition Letters
Efficient decision tree construction for mining time-varying data streams

CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
Efficient decision tree re-alignment for clustering time-changing data streams

From active data management to event-based systems and more
Editorial: Classifying text streams by keywords using classifier ensemble

Data & Knowledge Engineering
Learning from concept drifting data streams with unlabeled data

Neurocomputing
The CART decision tree for mining data streams

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most previously proposed mining methods on data streams make an unrealistic assumption that "labelled" data stream is readily available and can be mined at anytime. However, in most real-world problems, labelled data streams are rarely immediately available. Due to this reason, models are reconstructed only when labelled data become available periodically. This passive stream mining model has several drawbacks. We propose a new concept of demand-driven active data mining. In active mining, the loss of the model is either continuously guessed without using any true class labels or estimated, whenever necessary, from a small number of instances whose actual class labels are verified by paying an affordable cost. When the estimated loss is more than a tolerable threshold, the model evolves by using a small number of instances with verified true class labels. Previous work on active mining concentrates on error guess and estimation. In this paper, we discuss several approaches on decision tree evolution.