Evaluating a probabilistic model for cross-lingual information retrieval

Authors:
Jinxi Xu;Ralph Weischedel;Chanh Nguyen
Affiliations:
BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA
Venue:
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2001

Citing 12
Cited 38

Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing representations in Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
Structured translation for cross-language information retrieval

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Should we translate the documents or the queries in cross-language information retrieval?

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Statistical cross-language information retrieval using n-best query translations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual relevance models

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Empirical studies in strategies for Arabic retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic structured query methods

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Stemming in the language modeling framework

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Statistical Models for Monolingual and Bilingual Information Retrieval

Information Retrieval
Embedding web-based statistical translation models in cross-language information retrieval

Computational Linguistics - Special issue on web as corpus
A month to topic detection and tracking in Hindi

ACM Transactions on Asian Language Information Processing (TALIP)
Hindi CLIR in thirty days

ACM Transactions on Asian Language Information Processing (TALIP)
Cross-lingual retrieval for Hindi

ACM Transactions on Asian Language Information Processing (TALIP)
Anchor text mining for translation of Web queries: A transitive translation approach

ACM Transactions on Information Systems (TOIS)
Translating unknown queries with web corpora for cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Parsimonious language models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Relevancy based semantic interoperation of reuse repositories

Proceedings of the 12th ACM SIGSOFT twelfth international symposium on Foundations of software engineering
Technical issues of cross-language information retrieval: a review

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Structured queries, language modeling, and relevance modeling in cross-language information retrieval

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Empirical studies on the impact of lexical resources on CLIR performance

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Mining comparable bilingual text corpora for cross-language information integration

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining correlated bursty topic patterns from coordinated text streams

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Extending query translation to cross-language query expansion with markov chain models

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Using Web resources to construct multilingual medical thesaurus for cross-language medical information retrieval

Decision Support Systems
Statistical Language Models for Information Retrieval A Critical Review

Foundations and Trends in Information Retrieval
Translation disambiguation for cross-language information retrieval using context-based translation probability

Journal of Information Science
Cross language name matching

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Language and translation model adaptation using comparable corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Research on English-Chinese bi-directional cross-language information retrieval

Proceedings of the 2005 joint Chinese-German conference on Cognitive systems
Estimation of statistical translation models based on mutual information for ad hoc information retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
S3K: seeking statement-supporting top-K witnesses
Systematic evaluation of machine translation methods for image and video annotation

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Extracting multilingual topics from unaligned comparable corpora

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
MSU at ImageCLEF: cross language and interactive image retrieval

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)
Translation model based cross-lingual language model adaptation: from word models to phrase models

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Twitter translation using translation-based cross-lingual retrieval

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Bidirectional semi-supervised learning with graphs

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval

Journal of the American Society for Information Science and Technology
A unified framework for monolingual and cross-lingual relevance modeling based on probabilistic topic models

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work proposes and evaluates a probabilistic cross-lingual retrieval system. The system uses a generative model to estimate the probability that a document in one language is relevant, given a query in another language. An important component of the model is translation probabilities from terms in documents to terms in a query. Our approach is evaluated when 1) the only resource is a manually generated bilingual word list, 2) the only resource is a parallel corpus, and 3) both resources are combined in a mixture model. The combined resources produce about 90% of monolingual performance in retrieving Chinese documents. For Spanish the system achieves 85% of monolingual performance using only a pseudo-parallel Spanish-English corpus. Retrieval results are comparable with those of the structural query translation technique (Pirkola, 1998) when bilingual lexicons are used for query translation. When parallel texts in addition to conventional lexicons are used, it achieves better retrieval results but requires more computation than the structural query translation technique. It also produces slightly better results than using a machine translation system for CLIR, but the improvement over the MT system is not significant.