A framework for application-driven classification of data streams

Authors:
Peng Zhang;Byron J. Gao;Ping Liu;Yong Shi;Li Guo
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China and Department of Computer Science, Texas State University, San Marcos, TX 78666, USA;Department of Computer Science, Texas State University, San Marcos, TX 78666, USA;Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;FEDS Center, Graduate University, Chinese Academy of Sciences, Beijing 100190, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
Venue:
Neurocomputing
Year:
2012

Citing 37
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Handling concept drifts in incremental learning with support vector machines

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Incremental Support Vector Machine Construction

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Regularized multi--task learning

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining proactive and reactive predictions for data streams

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A martingale framework for concept change detection in time-varying data streams

ICML '05 Proceedings of the 22nd international conference on Machine learning
Using additive expert ensembles to cope with concept drift

ICML '05 Proceedings of the 22nd international conference on Machine learning
Data Streams: Models and Algorithms (Advances in Database Systems)

Data Streams: Models and Algorithms (Advances in Database Systems)
Large Scale Transductive SVMs

The Journal of Machine Learning Research
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Optimization Techniques for Semi-Supervised Support Vector Machines

The Journal of Machine Learning Research
Categorizing and mining concept drifting data streams

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Active Learning from Data Streams

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Cleansing Noisy Data Streams

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Multiple information sources cooperative learning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Mining Data Streams with Labeled and Unlabeled Training Examples

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Vague One-Class Learning for Data Streams

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Semi-Supervised Learning

Semi-Supervised Learning
SKIF: a data imputation framework for concept drifting data streams

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Robust ensemble learning for mining noisy data streams

Decision Support Systems
Active learning from stream data using optimal weight classifier ensemble

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Enabling fast prediction for ensemble models on data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

Data stream classification has drawn increasing attention from the data mining community in recent years. Relevant applications include network traffic monitoring, sensor network data analysis, Web click stream mining, power consumption measurement, dynamic tracing of stock fluctuations, to name a few. Data stream classification in such real-world applications is typically subject to three major challenges: concept drifting, large volumes, and partial labeling. As a result, training examples in data streams can be very diverse and it is very hard to learn accurate models with efficiency. In this paper, we propose a novel framework that first categorizes diverse training examples into four types and assign learning priorities to them. Then, we derive four learning cases based on the proportion and priority of the different types of training examples. Finally, for each learning case, we employ one of the four SVM-based training models: classical SVM, semi-supervised SVM, transfer semi-supervised SVM, and relational k-means transfer semi-supervised SVM. We perform comprehensive experiments on real-world data streams that validate the utility of our approach.