Exploiting temporal contexts in text classification

Authors:
Leonardo Rocha;Fernando Mourão;Adriano Pereira;Marcos André Gonçalves;Wagner Meira, Jr.
Affiliations:
Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 29
Cited 11

Information filtering and information retrieval: two sides of the same coin?

Communications of the ACM - Special issue on information filtering
Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning in the presence of concept drift and hidden contexts

Machine Learning
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical

Advances in kernel methods
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
Automatic Document Classification

Journal of the ACM (JACM)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Classifying text documents by associating terms with text categories

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Context and Page Analysis for Improved Web Search

IEEE Internet Computing
Web-Based Knowledge Management for Distributed Design

IEEE Intelligent Systems
Induction of Decision Trees

Machine Learning
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Incremental context mining for adaptive document classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Selection Algorithms: A Survey and Experimental Evaluation

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Adaptive Web Document Classification with MCRDR

ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
On the temporal dimension of search

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Adapting ranking SVM to document retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Multi-evidence, multi-criteria, lazy associative document classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Temporal profiles of queries

ACM Transactions on Information Systems (TOIS)
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
Using multiple windows to track concept drift

Intelligent Data Analysis
On the value of temporal information in information retrieval

ACM SIGIR Forum
An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Understanding temporal aspects in document classification

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Boosting classifiers for drifting concepts

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Exploring temporal evidence in web information retrieval

FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access

Quantifying the Impact of Information Aggregation on Complex Networks: A Temporal Perspective

WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Seller's credibility in electronic markets: a complex network based approach

Proceedings of the 3rd workshop on Information credibility on the web
Exploiting contexts to deal with uncertainty in classification

Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Temporally-aware algorithms for document classification

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A technique for improving the performance of naive bayes text classification

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Is the contextual information relevant in text clustering by compression?

Expert Systems with Applications: An International Journal
Exploring classification concept drift on a large news text corpus

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Unsupervised multi-label text classification using a world knowledge ontology

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Improving tweet stream classification by detecting changes in word probability

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Temporal contexts: Effective text classification in evolving document collections

Information Systems
Timeline adaptation for text classification

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the increasing amount of information being stored and accessible through the Web, Automatic Document Classification (ADC) has become an important research topic. ADC usually employs a supervised learning strategy, where we first build a classification model using pre-classified documents and then use it to classify unseen documents. One major challenge in building classifiers is dealing with the temporal evolution of the characteristics of the documents and the classes to which they belong. However, most of the current techniques for ADC do not consider this evolution while building and using the models. Previous results show that the performance of classifiers may be affected by three different temporal effects (class distribution, term distribution and class similarity). Further, it is shown that using just portions of the pre-classified documents, which we call contexts, for building the classifiers, result in better performance, as a consequence of the minimization of the aforementioned effects. In this paper we define the concept of temporal contexts as being the portions of documents that minimize those effects. We then propose a general algorithm for determining such contexts, discuss its implementation-related issues, and propose a heuristic that is able to determine temporal contexts efficiently. In order to demonstrate the effectiveness of our strategy, we evaluated it using two distinct collections: ACM-DL and MedLine. We initially evaluated the reduction in terms of both the effort to build a classifier and the entropy associated with each context. Further, we evaluated whether these observed reductions translate into better classification performance by employing a very simple classifier, majority voting. The results show that we achieved precision gains of up to 30% compared to a version that is not temporally contextualized, and the same accuracy of a state-of-the-art classifier (SVM), while presenting an execution time up to hundreds of times faster.