A framework for application-driven classification of data streams

  • Authors:
  • Peng Zhang;Byron J. Gao;Ping Liu;Yong Shi;Li Guo

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China and Department of Computer Science, Texas State University, San Marcos, TX 78666, USA;Department of Computer Science, Texas State University, San Marcos, TX 78666, USA;Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;FEDS Center, Graduate University, Chinese Academy of Sciences, Beijing 100190, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

  • Venue:
  • Neurocomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

Data stream classification has drawn increasing attention from the data mining community in recent years. Relevant applications include network traffic monitoring, sensor network data analysis, Web click stream mining, power consumption measurement, dynamic tracing of stock fluctuations, to name a few. Data stream classification in such real-world applications is typically subject to three major challenges: concept drifting, large volumes, and partial labeling. As a result, training examples in data streams can be very diverse and it is very hard to learn accurate models with efficiency. In this paper, we propose a novel framework that first categorizes diverse training examples into four types and assign learning priorities to them. Then, we derive four learning cases based on the proportion and priority of the different types of training examples. Finally, for each learning case, we employ one of the four SVM-based training models: classical SVM, semi-supervised SVM, transfer semi-supervised SVM, and relational k-means transfer semi-supervised SVM. We perform comprehensive experiments on real-world data streams that validate the utility of our approach.