Phonetic string matching: lessons from information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic Retrieval of OCR Degraded Text Using N-Grams
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Dictionary Methods for Cross-Lingual Information Retrieval
DEXA '96 Proceedings of the 7th International Conference on Database and Expert Systems Applications
Correcting broken characters in the recognition of historical printed documents
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Probabilistic structured query methods
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Formal multiple-bernoulli models for language modeling
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Dictionary-based techniques for cross-language information retrieval
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
Latent concept expansion using markov random fields
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Temporal Language Models for Determining Time of Non-timestamped Documents
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Information Processing and Management: an International Journal
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Enhancing query translation with relevance feedback in translingual information retrieval
Information Processing and Management: an International Journal
Information search and retrieval in microblogs
Journal of the American Society for Information Science and Technology
Supervised language modeling for temporal resolution of texts
Proceedings of the 20th ACM international conference on Information and knowledge management
Social book search: comparing topical relevance judgements and book suggestions for evaluation
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
This paper addresses the problem of long-term language change in information retrieval (IR) systems. IR research has often ignored lexical drift. But in the emerging domain of massive digitized book collections, the risk of vocabulary mismatch due to language change is high. Collections such as Google Books and the Hathi Trust contain text written in the vernaculars of many centuries. With respect to IR, changes in vocabulary and orthography make 14th-Century English qualitatively different from 21st-Century English. This challenges retrieval models that rely on keyword matching. With this challenge in mind, we ask: given a query written in contemporary English, how can we retrieve relevant documents that were written in early English? We argue that search in historically diverse corpora is similar to cross-language retrieval (CLIR). By considering "modern" English and "archaic" English as distinct languages, CLIR techniques can improve what we call cross-temporal IR (CTIR). We focus on ways to combine evidence to improve CTIR effectiveness, proposing and testing several ways to handle language change during book search. We find that a principled combination of three sources of evidence during relevance feedback yields strong CTIR performance.