Concurrent semi-supervised learning of data streams

Authors:
Hai-Long Nguyen;Wee-Keong Ng;Yew-Kwong Woon;Duc H. Tran
Affiliations:
Nanyang Technological University, Singapore;Nanyang Technological University, Singapore;EADS Innovation Works Singapore;Nanyang Technological University, Singapore
Venue:
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Year:
2011

Citing 15
Cited 0

Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A Framework for On-Demand Classification of Evolving Data Streams

IEEE Transactions on Knowledge and Data Engineering
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Grid-based subspace clustering over data streams

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Tracking clusters in evolving data streams over sliding windows

Knowledge and Information Systems
A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
A comparison of extrinsic clustering evaluation metrics based on formal constraints

Information Retrieval
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning
MOA: Massive Online Analysis

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional stream mining algorithms focus on single and stand-alone mining tasks. Given the single-pass nature of data streams, it makes sense to maximize throughput by performing multiple complementary mining tasks concurrently. We investigate the potential of concurrent semi-supervised learning on data streams and propose an incremental algorithm called CSL-Stream (Concurrent Semi-supervised Learning of Data Streams) that performs clustering and classification at the same time. Experiments using common synthetic and real datasets show that CSL-Stream outperforms prominent clustering and classification algorithms (D-Stream and SmSCluster) in terms of accuracy, speed and scalability. The success of CSL-Stream paves the way for a new research direction in understanding latent commonalities among various data mining tasks in order to exploit the power of concurrent stream mining.