Flexible decision tree for data stream classification in the presence of concept change, noise and missing values

Authors:
Sattar Hashemi;Ying Yang
Affiliations:
School of Electrical Engineering and Computer Sciences, Shiraz University, Shiraz, Iran;Australian Taxation Office, Melbourne, Australia
Venue:
Data Mining and Knowledge Discovery
Year:
2009

Citing 26
Cited 5

C4.5: programs for machine learning

C4.5: programs for machine learning
Induction of decision trees

Readings in knowledge acquisition and learning
Learning in the presence of concept drift and hidden contexts

Machine Learning
Fuzzy set theory—and its applications (3rd ed.)

Fuzzy set theory—and its applications (3rd ed.)
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning

Machine Learning
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
A complete fuzzy decision tree technique

Fuzzy Sets and Systems - Theme: Learning and modeling
Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Systematic data selection to mine concept-drifting data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic Classifier Selection for Effective Mining from Noisy Data Streams

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Class Noise vs. Attribute Noise: A Quantitative Study

Artificial Intelligence Review
Combining proactive and reactive predictions for data streams

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A martingale framework for concept change detection in time-varying data streams

ICML '05 Proceedings of the 22nd international conference on Machine learning
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Effective classification of noisy data streams with attribute-oriented dynamic classifier selection

Knowledge and Information Systems
Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams

Data Mining and Knowledge Discovery
Online Adaptive Decision Trees: Pattern Classification and Function Approximation

Neural Computation
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Handling Missing Values when Applying Classification Models

The Journal of Machine Learning Research
To better handle concept change and noise: a cellular automata approach to data stream classification

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Fuzzy decision trees: issues and methods

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

A new fuzzy classifier for data streams

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
Incrementally optimized decision tree for noisy big data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Incrementally optimized decision tree for noisy big data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Information enhancement for data mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A similarity-based approach for data stream classification

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, classification learning for data streams has become an important and active research topic. A major challenge posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised accordingly and timely. To detect concept change, a common methodology is to observe the online classification accuracy. If accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this methodology is that any drop in classification accuracy can be interpreted as a symptom of concept change. Unfortunately however, this assumption is often violated in the real world where data streams carry noise that can also introduce a significant reduction in classification accuracy. To compound this problem, traditional noise cleansing methods are incompetent for data streams. Those methods normally need to scan data multiple times whereas learning for data streams can only afford one-pass scan because of data's high speed and huge volume. Another open problem in data stream classification is how to deal with missing values. When new instances containing missing values arrive, how a learning model classifies them and how the learning model updates itself according to them is an issue whose solution is far from being explored. To solve these problems, this paper proposes a novel classification algorithm, flexible decision tree (FlexDT), which extends fuzzy logic to data stream classification. The advantages are three-fold. First, FlexDT offers a flexible structure to effectively and efficiently handle concept change. Second, FlexDT is robust to noise. Hence it can prevent noise from interfering with classification accuracy, and accuracy drop can be safely attributed to concept change. Third, it deals with missing values in an elegant way. Extensive evaluations are conducted to compare FlexDT with representative existing data stream classification algorithms using a large suite of data streams and various statistical tests. Experimental results suggest that FlexDT offers a significant benefit to data stream classification in real-world scenarios where concept change, noise and missing values coexist.