A maximum coherence model for dictionary-based cross-language information retrieval

Authors:
Yi Liu;Rong Jin;Joyce Y. Chai
Affiliations:
Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI
Venue:
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2005

Citing 13
Cited 11

Querying across languages: a dictionary-based approach to multilingual information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Phrasal translation and query expansion techniques for cross-language information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Query term disambiguation for Web cross-language information retrieval using a search engine

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Improving query translation for cross-language information retrieval using statistical models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical cross-language information retrieval using n-best query translations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual relevance models

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Using Statistical Term Similarity for Sense Disambiguationin Cross-Language Information Retrieval

Information Retrieval
Using Statistical Translation Models for Bilingual IR

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Embedding web-based statistical translation models in cross-language information retrieval

Computational Linguistics - Special issue on web as corpus
Using mutual information to resolve query translation ambiguities and query term weighting

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

A study of statistical models for query translation: finding a good unit of translation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical query translation models for cross-language information retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Term disambiguation techniques based on target document collection for cross-language information retrieval: an empirical comparison of performance between techniques

Information Processing and Management: an International Journal
A Hybrid Technique for English-Chinese Cross Language Information Retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Gcon: a graph-based technique for resolving ambiguity in query translation candidates

Proceedings of the 2008 ACM symposium on Applied computing
A progressive algorithm for cross-language information retrieval based on dictionary translation

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
A refinement framework for cross language text categorization

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Enhancing query translation with relevance feedback in translingual information retrieval

Information Processing and Management: an International Journal
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval

Journal of the American Society for Information Science and Technology
Flat vs. hierarchical phrase-based translation models for cross-language information retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

One key to cross-language information retrieval is how to efficiently resolve the translation ambiguity of queries given their short length. This problem is even more challenging when only bilingual dictionaries are available, which is the focus of this paper. In the previous research of cross-language information retrieval using bilingual dictionaries, the word co-occurrence statistics is used to determine the most likely translations of queries. In this paper, we propose a novel statistical model, named ``maximum coherence model'', which estimates the translation probabilities of query words that are consistent with the word co-occurrence statistics. Unlike the previous work, where a binary decision is made for the selection of translations, the new model maintains the uncertainty in translating query words when their sense ambiguity is difficult to resolve. Furthermore, this new model is able to estimate translations of multiple query words simultaneously. This is in contrast to many previous approaches where translations of individual query words are determined independently. Empirical studies with TREC datasets have shown that the maximum coherence model achieves a relative 10% - 40% improvement in cross-language information retrieval, comparing to other approaches that also use word co-occurrence statistics for sense disambiguation.