Results of applying probabilistic IR to OCR text
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining the language model and inference network approaches to retrieval
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improving weak ad-hoc queries using wikipedia asexternal corpus
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A knowledge-based search engine powered by wikipedia
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Book search: indexing the valuable parts
Proceedings of the 2008 ACM workshop on Research advances in large digital book repositories
Wikipedia pages as entry points for book search
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Query dependent pseudo-relevance feedback based on wikipedia
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Book search experiments: investigating IR methods for the indexing and retrieval of books
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Overview of the INEX 2009 book track
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Overview of the INEX 2010 book track: scaling up the evaluation using crowdsourcing
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Overview of the INEX 2010 book track: scaling up the evaluation using crowdsourcing
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Hi-index | 0.00 |
In this paper we describe our participation and present our contributions in the INEX 2010 Book Track. Digitized books are now a common source of information on the Web, however OCR sometimes introduces errors that can penalize Information Retrieval. We propose a method for correcting hyphenations in the books and we analyse its impact on the Best Books for Reference task. The observed improvement is around 1%. This year we also experimented different query expansion techniques. The first one consists of selecting informative words from a Wikipedia page related to the topic. The second one uses a dependency parser to enrich the query with the detected phrases using a Markov Random Field model. We show that there is a significant improvement over the state-of-the-art when using a large weighted list of Wikipedia words, meanwhile hyphenation correction has an impact on their distribution over the book corpus.