The algorithm APT to classify in concurrence of latency and drift

Authors:
Georg Krempl
Affiliations:
University of Graz, Department of Statistics and Operations Research, Graz, Austria
Venue:
IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Year:
2011

Citing 8
Cited 1

Principles of data mining

Principles of data mining
On Change Diagnosis in Evolving Data Streams

IEEE Transactions on Knowledge and Data Engineering
Introduction to Clustering Large and High-Dimensional Data

Introduction to Clustering Large and High-Dimensional Data
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Effective clustering and boundary detection algorithm based on Delaunay triangulation

Pattern Recognition Letters
Assignment Problems

Assignment Problems
On exploiting the power of time in data mining

ACM SIGKDD Explorations Newsletter
The impact of latency on online classification learning with concept drift

KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management

Drift mining in data: A framework for addressing drift in classification

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Population drift is a challenging problem in classification, and denotes changes in probability distributions over time. Known driftadaptive classification methods such as incremental learning rely on current, labelled data for classification model updates, assuming that such labelled data are available without verification latency. However, verification latency is a relevant problem in some application domains, where predictions have to be made far into the future. This concurrence of drift and latency requires new approaches in machine learning. We propose a two-stage learning strategy: First, the nature of drift in temporal data needs to be identified. This requires the formulation of explicit drift models for the underlying data generating process. In a second step, these models are used to substitute scarce labelled data for updating classification models. This paper contributes an explicit drift model, which is characterising a mixture of independently evolving sub-populations. In this model, the joint distribution is a mixture of arbitrarily distributed subpopulations drifting over time. An arbitrary sub-population tracker algorithm is presented, which can track and predict the distributions by the use of unlabelled data. Experimental evaluation shows that the presented APT algorithm is capable of tracking and predicting changes in the posterior distribution of class labels accurately.