A study of statistical models for query translation: finding a good unit of translation

Authors:
Jianfeng Gao;Jian-Yun Nie
Affiliations:
Microsoft Research;Université de Montréal
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 21
Cited 11

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Resolving ambiguity for cross-language retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An introduction to variational methods for graphical models

Learning in graphical models
Improving query translation for cross-language information retrieval using statistical models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Using Statistical Term Similarity for Sense Disambiguationin Cross-Language Information Retrieval

Information Retrieval
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
Noun phrase translation

Noun phrase translation
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Linear discriminant model for information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A maximum coherence model for dictionary-based cross-language information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach

Computational Linguistics
Phrasal cohesion and statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Dependency treelet translation: syntactically informed phrasal SMT

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Machine translation using probabilistic synchronous dependency insertion grammars

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Automatic acquisition of chinese–english parallel corpus from the web

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Extending query translation to cross-language query expansion with markov chain models

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A Hybrid Technique for English-Chinese Cross Language Information Retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Gcon: a graph-based technique for resolving ambiguity in query translation candidates

Proceedings of the 2008 ACM symposium on Applied computing
Advanced Information Retrieval

Electronic Notes in Theoretical Computer Science (ENTCS)
Named entity recognition in query

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
An automatic translation of tags for multimedia contents using folksonomy networks

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A refinement framework for cross language text categorization

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Selecting automatically the best query translations

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Learning inter-related statistical query translation models for English-Chinese bi-directional CLIR

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a study of three statistical query translation models that use different units of translation. We begin with a review of a word-based translation model that uses co-occurrence statistics for resolving translation ambiguities. The translation selection problem is then formulated under the framework of graphic model resorting to which the modeling assumptions and limitations of the co-occurrence model are discussed, and the research of finding better translation units is motivated. Then, two other models that use larger, linguistically motivated translation units (i.e., noun phrase and dependency triple) are presented. For each model, the modeling and training methods are described in detail. All query translation models are evaluated using TREC collections. Results show that larger translation units lead to more specific models that usually achieve better translation and cross-language information retrieval results.