Query representation for cross-temporal information retrieval

Authors:
Miles Efron
Affiliations:
University of Illinois, Champaign, IL, USA
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 24
Cited 0

Phonetic string matching: lessons from information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic Retrieval of OCR Degraded Text Using N-Grams

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Dictionary Methods for Cross-Lingual Information Retrieval

DEXA '96 Proceedings of the 7th International Conference on Database and Expert Systems Applications
Correcting broken characters in the recognition of historical printed documents

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Probabilistic structured query methods

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Time-based language models

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Formal multiple-bernoulli models for language modeling

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Dictionary-based techniques for cross-language information retrieval

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Learning stochastic edit distance: Application in handwritten character recognition

Pattern Recognition
Latent concept expansion using markov random fields

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Temporal Language Models for Determining Time of Non-timestamped Documents

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Examining and improving the effectiveness of relevance feedback for retrieval of scanned text documents

Information Processing and Management: an International Journal
Estimation of statistical translation models based on mutual information for ad hoc information retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Enhancing query translation with relevance feedback in translingual information retrieval

Information Processing and Management: an International Journal
Information search and retrieval in microblogs

Journal of the American Society for Information Science and Technology
Supervised language modeling for temporal resolution of texts

Proceedings of the 20th ACM international conference on Information and knowledge management
Report on BooksOnline'11: 4th workshop on online books, complementary social media, and crowdsourcing

ACM SIGIR Forum
Social book search: comparing topical relevance judgements and book suggestions for evaluation

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of long-term language change in information retrieval (IR) systems. IR research has often ignored lexical drift. But in the emerging domain of massive digitized book collections, the risk of vocabulary mismatch due to language change is high. Collections such as Google Books and the Hathi Trust contain text written in the vernaculars of many centuries. With respect to IR, changes in vocabulary and orthography make 14th-Century English qualitatively different from 21st-Century English. This challenges retrieval models that rely on keyword matching. With this challenge in mind, we ask: given a query written in contemporary English, how can we retrieve relevant documents that were written in early English? We argue that search in historically diverse corpora is similar to cross-language retrieval (CLIR). By considering "modern" English and "archaic" English as distinct languages, CLIR techniques can improve what we call cross-temporal IR (CTIR). We focus on ways to combine evidence to improve CTIR effectiveness, proposing and testing several ways to handle language change during book search. We find that a principled combination of three sources of evidence during relevance feedback yields strong CTIR performance.