The algorithm APT to classify in concurrence of latency and drift

  • Authors:
  • Georg Krempl

  • Affiliations:
  • University of Graz, Department of Statistics and Operations Research, Graz, Austria

  • Venue:
  • IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Population drift is a challenging problem in classification, and denotes changes in probability distributions over time. Known driftadaptive classification methods such as incremental learning rely on current, labelled data for classification model updates, assuming that such labelled data are available without verification latency. However, verification latency is a relevant problem in some application domains, where predictions have to be made far into the future. This concurrence of drift and latency requires new approaches in machine learning. We propose a two-stage learning strategy: First, the nature of drift in temporal data needs to be identified. This requires the formulation of explicit drift models for the underlying data generating process. In a second step, these models are used to substitute scarce labelled data for updating classification models. This paper contributes an explicit drift model, which is characterising a mixture of independently evolving sub-populations. In this model, the joint distribution is a mixture of arbitrarily distributed subpopulations drifting over time. An arbitrary sub-population tracker algorithm is presented, which can track and predict the distributions by the use of unlabelled data. Experimental evaluation shows that the presented APT algorithm is capable of tracking and predicting changes in the posterior distribution of class labels accurately.