A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Incremental context mining for adaptive document classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Relevant Data Expansion for Learning Concept Drift from Sparsely Labeled Data
IEEE Transactions on Knowledge and Data Engineering
Accuracy estimation with clustered dataset
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Learning drifting concepts: Example selection vs. example weighting
Intelligent Data Analysis
Understanding temporal aspects in document classification
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Local likelihood modeling of temporal text streams
Proceedings of the 25th international conference on Machine learning
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Exploiting temporal contexts in text classification
Proceedings of the 17th ACM conference on Information and knowledge management
An adaptive personalized news dissemination system
Journal of Intelligent Information Systems
Temporally-aware algorithms for document classification
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Concept drift has regained research interest during recent years as many applications use data sources that are changing over time. We study the classification task using logistic regression on a large news collection of 248K texts during a period of seven years. We present extrinsic methods of concept drift detection and quantification using training set formation with different windowing techniques. We characterize concept drift on a seven-year-long Le Monde news corpus and show the overestimation of classifier performance if it is neglected. We lay out paths for future work where we plan to refine extrinsic characterization methods and investigate the drifting of learning parameters when few examples are available.