On the window size for classification in changing environments
Intelligent Data Analysis
Hi-index | 0.00 |
We look at binary online classification in the light of sudden concept drift (data exhibits non-stationarity). The accuracy of the classifier trained on a mixture of old and new data is compared to the accuracy of the classifier trained only on new data, assuming known point of concept drift. We employ a simplified model of concept drift and derive theoretical generalization error for the Euclidean linear classifier. Right after concept drift the retrained classifier is more accurate than the new classifier, especially in cases when the data is complex (high dimensionality, low separability). The new classifier should be preferred when the extent of drift is very large.