Linear time series models for term weighting in information retrieval

Authors:
Miles Efron
Affiliations:
Graduate School of Library and Information Science, University of Illinois, 501 E. Daniel St., Champaign, IL 61820
Venue:
Journal of the American Society for Information Science and Technology
Year:
2010

Citing 0
Cited 11

Using the past to score the present: extending term weighting models through revision history analysis

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Understanding temporal query dynamics

Proceedings of the fourth ACM international conference on Web search and data mining
A word at a time: computing word relatedness using temporal semantic analysis

Proceedings of the 20th international conference on World wide web
Keeping keywords fresh: a BM25 variation for personalized keyword extraction

Proceedings of the 2nd Temporal Web Analytics Workshop
Temporal pseudo-relevance feedback in microblog retrieval

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Time-sensitive query auto-completion

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Expediting search trend detection via prediction of query counts

Proceedings of the sixth ACM international conference on Web search and data mining
Fast candidate generation for real-time tweet search with bloom filter chains

ACM Transactions on Information Systems (TOIS)
Behavioral dynamics on the web: Learning, modeling, and prediction

ACM Transactions on Information Systems (TOIS)
Information Retrieval with Time Series Query

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Using temporal bursts for query modeling

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models. © 2010 Wiley Periodicals, Inc.