Concurrent semi-supervised learning of data streams

  • Authors:
  • Hai-Long Nguyen;Wee-Keong Ng;Yew-Kwong Woon;Duc H. Tran

  • Affiliations:
  • Nanyang Technological University, Singapore;Nanyang Technological University, Singapore;EADS Innovation Works Singapore;Nanyang Technological University, Singapore

  • Venue:
  • DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Conventional stream mining algorithms focus on single and stand-alone mining tasks. Given the single-pass nature of data streams, it makes sense to maximize throughput by performing multiple complementary mining tasks concurrently. We investigate the potential of concurrent semi-supervised learning on data streams and propose an incremental algorithm called CSL-Stream (Concurrent Semi-supervised Learning of Data Streams) that performs clustering and classification at the same time. Experiments using common synthetic and real datasets show that CSL-Stream outperforms prominent clustering and classification algorithms (D-Stream and SmSCluster) in terms of accuracy, speed and scalability. The success of CSL-Stream paves the way for a new research direction in understanding latent commonalities among various data mining tasks in order to exploit the power of concurrent stream mining.