Temporally-aware algorithms for document classification

Authors:
Thiago Salles;Leonardo Rocha;Gisele L. Pappa;Fernando Mourão;Wagner Meira, Jr.;Marcos Gonçalves
Affiliations:
Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil;Fed. Univ. São João Del Rei, São João Del Rei, Brazil;Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil;Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil;Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil;Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil
Venue:
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Year:
2010

Citing 17
Cited 7

A remark on algorithm 643: FEXACT: an algorithm for performing Fisher's exact test in r x c contingency tables

ACM Transactions on Mathematical Software (TOMS)
Learning in the presence of concept drift and hidden contexts

Machine Learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
Context and Page Analysis for Improved Web Search

IEEE Internet Computing
Web-Based Knowledge Management for Distributed Design

IEEE Intelligent Systems
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Incremental context mining for adaptive document classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Web Document Classification with MCRDR

ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
Using multiple windows to track concept drift

Intelligent Data Analysis
An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Understanding temporal aspects in document classification

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Boosting classifiers for drifting concepts

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Introduction to Information Retrieval

Introduction to Information Retrieval
Exploiting temporal contexts in text classification

Proceedings of the 17th ACM conference on Information and knowledge management

Text categorization methods for automatic estimation of verbal intelligence

Expert Systems with Applications: An International Journal
Exploring classification concept drift on a large news text corpus

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Improving tweet stream classification by detecting changes in word probability

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Temporal contexts: Effective text classification in evolving document collections

Information Systems
Timeline adaptation for text classification

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Research on adaptive classification algorithm based on non-segment and classified-centre-vector

International Journal of Intelligent Information and Database Systems
Research on classification algorithm and its application in cased-based reasoning

International Journal of Computer Applications in Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic Document Classification (ADC) is still one of the major information retrieval problems. It usually employs a supervised learning strategy, where we first build a classification model using pre-classified documents and then use this model to classify unseen documents. The majority of supervised algorithms consider that all documents provide equally important information. However, in practice, a document may be considered more or less important to build the classification model according to several factors, such as its timeliness, the venue where it was published in, its authors, among others. In this paper, we are particularly concerned with the impact that temporal effects may have on ADC and how to minimize such impact. In order to deal with these effects, we introduce a temporal weighting function (TWF) and propose a methodology to determine it for document collections. We applied the proposed methodology to ACM-DL and Medline and found that the TWF of both follows a lognormal. We then extend three ADC algorithms (namely kNN, Rocchio and Naïve Bayes) to incorporate the TWF. Experiments showed that the temporally-aware classifiers achieved significant gains, outperforming (or at least matching) state-of-the-art algorithms.