Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Communications of the ACM
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning in the presence of concept drift and hidden contexts
Machine Learning
Making large-scale support vector machine learning practical
Advances in kernel methods
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
Automatic Document Classification
Journal of the ACM (JACM)
Context and Page Analysis for Improved Web Search
IEEE Internet Computing
Web-Based Knowledge Management for Distributed Design
IEEE Intelligent Systems
Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Detecting Concept Drift with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feature Selection Algorithms: A Survey and Experimental Evaluation
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Adaptive Web Document Classification with MCRDR
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Adapting ranking SVM to document retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Tackling concept drift by temporal inductive transfer
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Learning drifting concepts: Example selection vs. example weighting
Intelligent Data Analysis
On the value of temporal information in information retrieval
ACM SIGIR Forum
An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams
ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Understanding temporal aspects in document classification
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Boosting classifiers for drifting concepts
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Introduction to Information Retrieval
Introduction to Information Retrieval
Exploiting temporal contexts in text classification
Proceedings of the 17th ACM conference on Information and knowledge management
Combining Time and Space Similarity for Small Size Learning under Concept Drift
ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Integrating Novel Class Detection with Classification for Concept-Drifting Data Streams
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Collaborative filtering with temporal dynamics
Communications of the ACM
Multilingual topic models for unaligned text
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Temporally-aware algorithms for document classification
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
IEEE Transactions on Knowledge and Data Engineering
Addressing Concept-Evolution in Concept-Drifting Data Streams
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Word co-occurrence features for text classification
Information Systems
Detecting Recurring and Novel Classes in Concept-Drifting Data Streams
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Bayesian checking for topic models
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
The management of a huge and growing amount of information available nowadays makes Automatic Document Classification (ADC), besides crucial, a very challenging task. Furthermore, the dynamics inherent to classification problems, mainly on the Web, make this task even more challenging. Despite this fact, the actual impact of such temporal evolution on ADC is still poorly understood in the literature. In this context, this work concerns to evaluate, characterize and exploit the temporal evolution to improve ADC techniques. As first contribution we highlight the proposal of a pragmatical methodology for evaluating the temporal evolution in ADC domains. Through this methodology, we can identify measurable factors associated to ADC models degradation over time. Going a step further, based on such analyzes, we propose effective and efficient strategies to make current techniques more robust to natural shifts over time. We present a strategy, named temporal context selection, for selecting portions of the training set that minimize those factors. Our second contribution consists of proposing a general algorithm, called Chronos, for determining such contexts. By instantiating Chronos, we are able to reduce uncertainty and improve the overall classification accuracy. Empirical evaluations of heuristic instantiations of the algorithm, named WindowsChronos and FilterChronos, on two real document collections demonstrate the usefulness of our proposal. Comparing them against state-of-the-art ADC algorithms shows that selecting temporal contexts allows improvements on the classification accuracy up to 10%. Finally, we highlight the applicability and the generality of our proposal in practice, pointing out this study as a promising research direction.