Learning from concept drifting data streams with unlabeled data

Authors:
Xindong Wu;Peipei Li;Xuegang Hu
Affiliations:
School of Computer Science and Information Engineering, Hefei University of Technology, Anhui 230009, China and Department of Computer Science, University of Vermont, Burlington, VT 50405, USA;School of Computer Science and Information Engineering, Hefei University of Technology, Anhui 230009, China;School of Computer Science and Information Engineering, Hefei University of Technology, Anhui 230009, China
Venue:
Neurocomputing
Year:
2012

Citing 27
Cited 2

Tracking Drifting Concepts By Minimizing Disagreements

Machine Learning - Special issue on computational learning theory
Technical opinion: comparing Java vs. C/C++ efficiency differences to interpersonal differences

Communications of the ACM
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Learning from Noisy Data

Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Nile: A Query Processing Engine for Data Streams

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Decision Tree Evolution Using Limited Number of Labeled Data Items from Drifting Data Streams

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Relevant Data Expansion for Learning Concept Drift from Sparsely Labeled Data

IEEE Transactions on Knowledge and Data Engineering
Sequential Pattern Mining in Multiple Streams

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Clustering-training for Data Stream Mining

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Learning Higher Accuracy Decision Trees from Concept Drifting Data Streams

IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
New ensemble methods for evolving data streams

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Issues in evaluation of stream learning algorithms

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
OcVFDT: one-class very fast decision tree for one-class classification of data streams

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
On classification and segmentation of massive audio data streams

Knowledge and Information Systems
Ambiguous decision trees for mining concept-drifting data streams

Pattern Recognition Letters
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
FlockStream: A Bio-Inspired Algorithm for Clustering Evolving Data Streams

ICTAI '09 Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence
Efficient mining of skyline objects in subspaces over data streams

Knowledge and Information Systems
Consistent collective evaluation of multiple continuous queries for filtering heterogeneous data streams

Knowledge and Information Systems
A RANDOM DECISION TREE ENSEMBLE FOR MINING CONCEPT DRIFTS FROM NOISY DATA STREAMS

Applied Artificial Intelligence
TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

Knowledge and Information Systems

Ensemble of online neural networks for non-stationary and imbalanced data streams

Neurocomputing
A similarity-based approach for data stream classification

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Most existing work on classification of data streams assumes that all streaming data are labeled and the class labels are immediately available. However, in real-world applications, such as credit fraud and intrusion detection, this assumption is not always valid. Thus, it is a challenge to learn from concept drifting data streams with unlabeled data. With this motivation, we propose a Semi-supervised classification algorithm for data streams with concept drifts and UNlabeled data (SUN) in this paper. In SUN, a clustering algorithm is developed from k-Modes and implemented to produce concept clusters at leaves in an incremental decision tree. In terms of deviations between history concept clusters and new ones, potential concept drifts are distinguished from noise. Extensive studies on both synthetic and real-world data demonstrate that SUN performs well compared to several state-of-the-art online supervised and semi-supervised algorithms, even when there are more than 90% unlabeled data. A conclusion is hence drawn that SUN provides a promising framework for tackling concept drifting data streams with unlabeled data.