StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams

  • Authors:
  • Wei Fan

  • Affiliations:
  • IBM T.J. Watson Research, Hawthorne, NY

  • Venue:
  • VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We demonstrate StreamMiner, a random decision-tree ensemble based engine to mine data streams. A fundamental challenge in data stream mining applications (e.g., credit card transaction authorization, security buy-sell transaction, and phone call records, etc) is concept-drift or the discrepancy between the previously learned model and the true model in the new data. The basic problem is the ability to judiciously select data and adapt the old model to accurately match the changed concept of the data stream. StreamMiner uses several techniques to support mining over data streams with possible concept-drifts. We demonstrate the following two key functionalities of StreamMiner: 1. Detecting possible concept-drift on the fly when the trained streaming model is used to classify incoming data streams without knowing the ground truth. 2. Systematic data selection of old data and new data chunks to compute the optimal model that best fits on the changing data streams.