Discrete Mathematics
Distance measures based on the edit distance for permutation-type representations
Journal of Heuristics
A review of metrics on permutations for search landscape analysis
Computers and Operations Research
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
New ensemble methods for evolving data streams
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Issues in evaluation of stream learning algorithms
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
New options for hoeffding trees
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
The Journal of Machine Learning Research
Permutation Tests for Studying Classifier Performance
The Journal of Machine Learning Research
Leveraging bagging for evolving data streams
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Addressing Concept-Evolution in Concept-Drifting Data Streams
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
A survey on concept drift adaptation
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
We study evaluation of online classifiers that are designed to adapt to changes in data distribution over time (concept drift). A standard procedure to evaluate such classifiers is the test-then-train, which iteratively uses the incoming instances for testing and then for updating a classifier. Comparing classifiers based on such a test risks to give biased results, since a dataset is processed only once in a fixed sequential order. Such a test concludes how well classifiers adapt when changes happen at fixed time points, while the ultimate goal is to assess how well they would adapt when changes of a similar type happen unexpectedly. To reduce the risk of biased evaluation we propose to run multiple tests with permuted data. A random permutation is not suitable, as it makes the data distribution uniform over time and destroys the adaptive learning problem. We develop three permutation techniques with theoretical control mechanisms that ensure that different distributions in data are preserved while perturbing the data order. The idea is to manipulate blocks of data keeping individual instances close together. Our permutations reduce the risk of biased evaluation by making it possible to analyze sensitivity of classifiers to variations in the data order.