Categorizing and mining concept drifting data streams

Authors:
Peng Zhang;Xingquan Zhu;Yong Shi
Affiliations:
Chinese Academy of Sciences, Beijing, China;Florida Atlantic University, Boca Raton, FL, USA;University of Nebraska at Omaha, Nebraska, NE, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 18
Cited 24

Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On demand classification of data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining proactive and reactive predictions for data streams

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Using additive expert ensembles to cope with concept drift

ICML '05 Proceedings of the 22nd international conference on Machine learning
Data Streams: Models and Algorithms (Advances in Database Systems)

Data Streams: Models and Algorithms (Advances in Database Systems)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Discriminative learning for differing training and test distributions

Proceedings of the 24th international conference on Machine learning
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
Multi-dimensional regression analysis of time-series data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Active Learning from Data Streams

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Model selection under covariate shift

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II

An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
New ensemble methods for evolving data streams

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Online phishing classification using adversarial data mining and signaling games

Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
Improving Adaptive Bagging Methods for Evolving Data Streams

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams

Proceedings of the 2010 conference on Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams
Transfer estimation of evolving class priors in data stream classification

Pattern Recognition
Online phishing classification using adversarial data mining and signaling games

ACM SIGKDD Explorations Newsletter
Robust ensemble learning for mining noisy data streams

Decision Support Systems
Experimental study on fighters behaviors mining

Expert Systems with Applications: An International Journal
Effective sentiment stream analysis with self-augmenting training and demand-driven projection

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Enabling fast prediction for ensemble models on data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Predictive Data Stream Filtering

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Pattern change discovery between high dimensional data sets

Proceedings of the 20th ACM international conference on Information and knowledge management
Probabilistic user modeling in the presence of drifting concepts

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Modified blame-based noise reduction for concept drift

AIKED'12 Proceedings of the 11th WSEAS international conference on Artificial Intelligence, Knowledge Engineering and Data Bases
A framework for application-driven classification of data streams

Neurocomputing
A rank-one update method for least squares linear discriminant analysis with concept drift

Pattern Recognition
Decision Rule Extraction for Regularized Multiple Criteria Linear Programming Model

International Journal of Data Warehousing and Mining
Learning from data streams with only positive and unlabeled data

Journal of Intelligent Information Systems
A survey on concept drift adaptation

ACM Computing Surveys (CSUR)
Concept drift detection via competence models

Artificial Intelligence
Just-in-time adaptive similarity component analysis in nonstationary environments

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Classifying evolving data streams with partially labeled data

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining concept drifting data streams is a defining challenge for data mining research. Recent years have seen a large body of work on detecting changes and building prediction models from stream data, with a vague understanding on the types of the concept drifting and the impact of different types of concept drifting on the mining algorithms. In this paper, we first categorize concept drifting into two scenarios: Loose Concept Drifting (LCD) and Rigorous Concept Drifting (RCD), and then propose solutions to handle each of them separately. For LCD data streams, because concepts in adjacent data chunks are sufficiently close to each other, we apply kernel mean matching (KMM) method to minimize the discrepancy of the data chunks in the kernel space. Such a minimization process will produce weighted instances to build classifier ensemble and handle concept drifting data streams. For RCD data streams, because genuine concepts in adjacent data chunks may randomly and rapidly change, we propose a new Optimal Weights Adjustment (OWA) method to determine the optimum weight values for classifiers trained from the most recent (up-to-date) data chunk, such that those classifiers can form an accurate classifier ensemble to predict instances in the yet-to-come data chunk. Experiments on synthetic and real-world datasets will show that weighted instance approach is preferable when the concept drifting is mainly caused by the changing of the class prior probability; whereas the weighted classifier approach is preferable when the concept drifting is mainly triggered by the changing of the conditional probability.