One-class learning and concept summarization for data streams

  • Authors:
  • Xingquan Zhu;Wei Ding;Philip S. Yu;Chengqi Zhang

  • Affiliations:
  • Univ. of Technology, Centre for Quantum Computation and Intelligent Systems, Faculty of Eng. and Information Technology, 2007, Sydney, NSW, Australia and Florida Atlantic University, Dept. of Comp ...;University of Massachusetts Boston, Department of Computer Science, 02125, Boston, MA, USA;University of Illinois at Chicago, Department of Computer Science, 60680, Chicago, IL, USA;Univ. of Technology, Centre for Quantum Computation and Intelligent Systems, Faculty of Eng. and Information Technology, 2007, Sydney, NSW, Australia

  • Venue:
  • Knowledge and Information Systems - Special Issue on Data Warehousing and Knowledge Discovery from Sensors and Streams
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we formulate a new research problem of concept learning and summarization for one-class data streams. The main objectives are to (1) allow users to label instance groups, instead of single instances, as positive samples for learning, and (2) summarize concepts labeled by users over the whole stream. The employment of the batch-labeling raises serious issues for stream-oriented concept learning and summarization, because a labeled instance group may contain non-positive samples and users may change their labeling interests at any time. As a result, so the positive samples labeled by users, over the whole stream, may be inconsistent and contain multiple concepts. To resolve these issues, we propose a one-class learning and summarization (OCLS) framework with two major components. In the first component, we propose a vague one-class learning (VOCL) module for concept learning from data streams using an ensemble of classifiers with instance level and classifier level weighting strategies. In the second component, we propose a one-class concept summarization (OCCS) module that uses clustering techniques and a Markov model to summarize concepts labeled by users, with only one scanning of the stream data. Experimental results on synthetic and real-world data streams demonstrate that the proposed VOCL module outperforms its peers for learning concepts from vaguely labeled stream data. The OCCS module is also able to rebuild a high-level summary for concepts marked by users over the stream.