Principles of data mining
On Change Diagnosis in Evolving Data Streams
IEEE Transactions on Knowledge and Data Engineering
Introduction to Clustering Large and High-Dimensional Data
Introduction to Clustering Large and High-Dimensional Data
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Effective clustering and boundary detection algorithm based on Delaunay triangulation
Pattern Recognition Letters
Assignment Problems
On exploiting the power of time in data mining
ACM SIGKDD Explorations Newsletter
The impact of latency on online classification learning with concept drift
KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Drift mining in data: A framework for addressing drift in classification
Computational Statistics & Data Analysis
Hi-index | 0.00 |
Population drift is a challenging problem in classification, and denotes changes in probability distributions over time. Known driftadaptive classification methods such as incremental learning rely on current, labelled data for classification model updates, assuming that such labelled data are available without verification latency. However, verification latency is a relevant problem in some application domains, where predictions have to be made far into the future. This concurrence of drift and latency requires new approaches in machine learning. We propose a two-stage learning strategy: First, the nature of drift in temporal data needs to be identified. This requires the formulation of explicit drift models for the underlying data generating process. In a second step, these models are used to substitute scarce labelled data for updating classification models. This paper contributes an explicit drift model, which is characterising a mixture of independently evolving sub-populations. In this model, the joint distribution is a mixture of arbitrarily distributed subpopulations drifting over time. An arbitrary sub-population tracker algorithm is presented, which can track and predict the distributions by the use of unlabelled data. Experimental evaluation shows that the presented APT algorithm is capable of tracking and predicting changes in the posterior distribution of class labels accurately.