ACM Transactions on Mathematical Software (TOMS)
Learning in the presence of concept drift and hidden contexts
Machine Learning
Making large-scale support vector machine learning practical
Advances in kernel methods
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
Context and Page Analysis for Improved Web Search
IEEE Internet Computing
Web-Based Knowledge Management for Distributed Design
IEEE Intelligent Systems
Detecting Concept Drift with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Incremental context mining for adaptive document classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Web Document Classification with MCRDR
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning drifting concepts: Example selection vs. example weighting
Intelligent Data Analysis
Using multiple windows to track concept drift
Intelligent Data Analysis
An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams
ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Understanding temporal aspects in document classification
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Boosting classifiers for drifting concepts
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Introduction to Information Retrieval
Introduction to Information Retrieval
Exploiting temporal contexts in text classification
Proceedings of the 17th ACM conference on Information and knowledge management
Text categorization methods for automatic estimation of verbal intelligence
Expert Systems with Applications: An International Journal
Exploring classification concept drift on a large news text corpus
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Improving tweet stream classification by detecting changes in word probability
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Timeline adaptation for text classification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Research on adaptive classification algorithm based on non-segment and classified-centre-vector
International Journal of Intelligent Information and Database Systems
Research on classification algorithm and its application in cased-based reasoning
International Journal of Computer Applications in Technology
Hi-index | 0.00 |
Automatic Document Classification (ADC) is still one of the major information retrieval problems. It usually employs a supervised learning strategy, where we first build a classification model using pre-classified documents and then use this model to classify unseen documents. The majority of supervised algorithms consider that all documents provide equally important information. However, in practice, a document may be considered more or less important to build the classification model according to several factors, such as its timeliness, the venue where it was published in, its authors, among others. In this paper, we are particularly concerned with the impact that temporal effects may have on ADC and how to minimize such impact. In order to deal with these effects, we introduce a temporal weighting function (TWF) and propose a methodology to determine it for document collections. We applied the proposed methodology to ACM-DL and Medline and found that the TWF of both follows a lognormal. We then extend three ADC algorithms (namely kNN, Rocchio and Naïve Bayes) to incorporate the TWF. Experiments showed that the temporally-aware classifiers achieved significant gains, outperforming (or at least matching) state-of-the-art algorithms.