One-Class Classification of Text Streams with Concept Drift

  • Authors:
  • Yang Zhang;Xue Li;Maria Orlowska

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Research on streaming data classification has been mostly based on the assumption that data can be fully labelled. However, this is impractical. Firstly it is impossible to make a complete labelling before all data has arrived. Secondly it is generally very expensive to obtain fully labelled data by using man power. Thirdly user interests may change with time so the labels issued earlier may be inconsistent with the labels issued later – this represents concept drift. In this paper, we consider the problem of one-class classification on text stream with respect to concept drift where a large volume of documents arrives at a high speed and with change of user interests and data distribution. In this case, only a small number of positively labelled documents is available for training. We propose a stacking style ensemble-based approach and have compared it to all other window-based approaches, such as single window, fixed window, and full memory approaches. Our experiment results demonstrate that the proposed ensemble approach outperforms all other approaches.