Simultaneous multilingual search for translingual information retrieval

Authors:
Kristen Parton;Kathleen R. McKeown;James Allan;Enrique Henestroza
Affiliations:
Columbia University, New York, NY, USA;Columbia University, New York, NY, USA;University of Massachusetts Amherst, Amherst, MA, USA;Columbia University, New York, NY, USA
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 12
Cited 7

The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic structured query methods

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Should we translate the documents or the queries in cross-language information retrieval?

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Combining the language model and inference network approaches to retrieval

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Combining bidirectional translation and synonymy for cross-language information retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Overview of the TREC 2006 ciQA task

ACM SIGIR Forum
Applying wikipedia's multilingual knowledge to cross-lingual question answering

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Where's the verb?: correcting machine translation during question answering

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Who, what, when, where, why?: comparing multiple approaches to the cross-lingual 5W task

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Enhancing query translation with relevance feedback in translingual information retrieval

Information Processing and Management: an International Journal
MT error detection for cross-lingual question answering

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)
Name phylogeny: a generative model of string variation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Amharic-English bilingual web search engine

Proceedings of the International Conference on Management of Emergent Digital EcoSystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of translingual information retrieval, where monolingual searchers issue queries in a different language than the document language(s) and the results must be returned in the language they know, the query language. We present a framework for translingual IR that integrates document translation and query translation into the retrieval model. The corpus is represented as an aligned, jointly indexed "pseudo-parallel" corpus, where each document contains the text of the document along with its translation into the query language. The queries are formulated as multilingual structured queries, where each query term and its translations into the document language(s) are treated as synonym sets. This model leverages simultaneous search in multiple languages against jointly indexed documents to improve the accuracy of results over search using document translation or query translation alone. For query translation, we compared a statistical machine translation (SMT) approach to a dictionary-based approach. We found that using a Wikipedia-derived dictionary for named entities combined with an SMT-based dictionary worked better than SMT alone. Simultaneous multilingual search also has other important features suited to translingual search, since it can provide an indication of poor document translation when a match with the source document is found. We show how close integration of CLIR and SMT allows us to improve result translation in addition to IR results.