Temporal feature modification for retrospective categorization

Authors:
Robert Liebscher;Richard K. Belew
Affiliations:
University of California, San Diego;University of California, San Diego
Venue:
FeatureEng '05 Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing
Year:
2005

Citing 8
Cited 0

Detection of shifts in user interests for personalized information filtering

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic generation of overview timelines

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised and supervised clustering for topic tracking

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Introduction to topic detection and tracking

Topic detection and tracking
Explorations within topic tracking and detection

Topic detection and tracking
An NLP & IR approach to topic detection

Topic detection and tracking
Latent dirichlet allocation

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show that the intelligent use of one small piece of contextual information--a document's publication date--can improve the performance of classifiers trained on a text categorization task. We focus on academic research documents, where the date of publication undoubtedly has an effect on an author's choice of words. To exploit this contextual feature, we propose the technique of temporal feature modification, which takes various sources of lexical change into account, including changes in term frequency, associative strength between terms and categories, and dynamic categorization systems. We present results of classification experiments using both full text papers and abstracts of conference proceedings, showing improved classification accuracy across the whole collection, with performance increases of greater than 40% when temporal features are exploited. The technique is fast, classifier-independent, and works well even when making only a few modifications.