Information filtering and information retrieval: two sides of the same coin?
Communications of the ACM - Special issue on information filtering
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning in the presence of concept drift and hidden contexts
Machine Learning
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical
Advances in kernel methods
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
Automatic Document Classification
Journal of the ACM (JACM)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Classifying text documents by associating terms with text categories
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Context and Page Analysis for Improved Web Search
IEEE Internet Computing
Web-Based Knowledge Management for Distributed Design
IEEE Intelligent Systems
Machine Learning
Detecting Concept Drift with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Incremental context mining for adaptive document classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Selection Algorithms: A Survey and Experimental Evaluation
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Adaptive Web Document Classification with MCRDR
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
On the temporal dimension of search
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Adapting ranking SVM to document retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Multi-evidence, multi-criteria, lazy associative document classification
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
ACM Transactions on Information Systems (TOIS)
Learning drifting concepts: Example selection vs. example weighting
Intelligent Data Analysis
Using multiple windows to track concept drift
Intelligent Data Analysis
On the value of temporal information in information retrieval
ACM SIGIR Forum
An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams
ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Understanding temporal aspects in document classification
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Boosting classifiers for drifting concepts
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Exploring temporal evidence in web information retrieval
FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Quantifying the Impact of Information Aggregation on Complex Networks: A Temporal Perspective
WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Seller's credibility in electronic markets: a complex network based approach
Proceedings of the 3rd workshop on Information credibility on the web
Exploiting contexts to deal with uncertainty in classification
Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Temporally-aware algorithms for document classification
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A technique for improving the performance of naive bayes text classification
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Is the contextual information relevant in text clustering by compression?
Expert Systems with Applications: An International Journal
Exploring classification concept drift on a large news text corpus
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Unsupervised multi-label text classification using a world knowledge ontology
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Improving tweet stream classification by detecting changes in word probability
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Timeline adaptation for text classification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Due to the increasing amount of information being stored and accessible through the Web, Automatic Document Classification (ADC) has become an important research topic. ADC usually employs a supervised learning strategy, where we first build a classification model using pre-classified documents and then use it to classify unseen documents. One major challenge in building classifiers is dealing with the temporal evolution of the characteristics of the documents and the classes to which they belong. However, most of the current techniques for ADC do not consider this evolution while building and using the models. Previous results show that the performance of classifiers may be affected by three different temporal effects (class distribution, term distribution and class similarity). Further, it is shown that using just portions of the pre-classified documents, which we call contexts, for building the classifiers, result in better performance, as a consequence of the minimization of the aforementioned effects. In this paper we define the concept of temporal contexts as being the portions of documents that minimize those effects. We then propose a general algorithm for determining such contexts, discuss its implementation-related issues, and propose a heuristic that is able to determine temporal contexts efficiently. In order to demonstrate the effectiveness of our strategy, we evaluated it using two distinct collections: ACM-DL and MedLine. We initially evaluated the reduction in terms of both the effort to build a classifier and the entropy associated with each context. Further, we evaluated whether these observed reductions translate into better classification performance by employing a very simple classifier, majority voting. The results show that we achieved precision gains of up to 30% compared to a version that is not temporally contextualized, and the same accuracy of a state-of-the-art classifier (SVM), while presenting an execution time up to hundreds of times faster.