Controlled permutations for testing adaptive classifiers

Authors:
Indre Žliobaite
Affiliations:
Smart Technology Research Center, Bournemouth University, Poole, UK
Venue:
DS'11 Proceedings of the 14th international conference on Discovery science
Year:
2011

Citing 12
Cited 1

Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Restricted permutations

Discrete Mathematics
Distance measures based on the edit distance for permutation-type representations

Journal of Heuristics
A review of metrics on permutations for search landscape analysis

Computers and Operations Research
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
New ensemble methods for evolving data streams

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Issues in evaluation of stream learning algorithms

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
New options for hoeffding trees

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
MOA: Massive Online Analysis

The Journal of Machine Learning Research
Permutation Tests for Studying Classifier Performance

The Journal of Machine Learning Research
Leveraging bagging for evolving data streams

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Addressing Concept-Evolution in Concept-Drifting Data Streams

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining

A survey on concept drift adaptation

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study evaluation of online classifiers that are designed to adapt to changes in data distribution over time (concept drift). A standard procedure to evaluate such classifiers is the test-then-train, which iteratively uses the incoming instances for testing and then for updating a classifier. Comparing classifiers based on such a test risks to give biased results, since a dataset is processed only once in a fixed sequential order. Such a test concludes how well classifiers adapt when changes happen at fixed time points, while the ultimate goal is to assess how well they would adapt when changes of a similar type happen unexpectedly. To reduce the risk of biased evaluation we propose to run multiple tests with permuted data. A random permutation is not suitable, as it makes the data distribution uniform over time and destroys the adaptive learning problem. We develop three permutation techniques with theoretical control mechanisms that ensure that different distributions in data are preserved while perturbing the data order. The idea is to manipulate blocks of data keeping individual instances close together. Our permutations reduce the risk of biased evaluation by making it possible to analyze sensitivity of classifiers to variations in the data order.