Embedding web-based statistical translation models in cross-language information retrieval

Authors:
Wessel Kraaij;Jian-Yun Nie;Michel Simard
Affiliations:
TNO TPD, Po Box 155, 2600 AD Delft, The Netherlands;DIRO, Université de Montréal, CP. 6128, succ. Centre-vill, Montreal, Qc. H3C 3J7 Canada;DIRO, Université de Montréal, CP. 6128, succ. Centre-vill, Montreal, Qc. H3C 3J7 Canada
Venue:
Computational Linguistics - Special issue on web as corpus
Year:
2003

Citing 34
Cited 34

A statistical approach to machine translation

Computational Linguistics
Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Viewing stemming as recall enhancement

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Querying across languages: a dictionary-based approach to multilingual information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval

21st Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval
The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Translingual information retrieval: learning from bilingual corpora

Artificial Intelligence - Special issue: artificial intelligence 40 years later
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Searching the Web: the public and their queries

Journal of the American Society for Information Science and Technology
Improving cross language retrieval with triangulated translation

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Improving query translation for cross-language information retrieval using statistical models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating a probabilistic model for cross-lingual information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Quantifying the utility of parallel corpora

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Topic Detection and Tracking: Event-Based Information Organization

Topic Detection and Tracking: Event-Based Information Organization
The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Two-stage language models for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing cross-language query expansion techniques by degrading translation resources

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical cross-language information retrieval using n-best query translations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual relevance models

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Translation Resources, Merging Strategies, and Relevance Feedback for Cross-Language Information Retrieval

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
A Language-Independent Approach to European Text Retrieval

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
TNO at CLEF-2001: Comparing Translation Resources

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Report on CLEF-2001 Experiments: Effective Combined Query-Translation Approach

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Query expansion and query translation as logical inference

Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
A maximum entropy/minimum divergence translation model

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Stemming and decompounding for German text retrieval

ECIR'03 Proceedings of the 25th European conference on IR research

A maximum coherence model for dictionary-based cross-language information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Parallel texts

Natural Language Engineering
Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

ACM Transactions on Asian Language Information Processing (TALIP)
Filtering or adapting: two strategies to exploit noisy parallel corpora for cross-language information retrieval

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A statistical framework for query translation disambiguation

ACM Transactions on Asian Language Information Processing (TALIP)
Cluster-based patent retrieval

Information Processing and Management: an International Journal
Sentence alignment using P-NNT and GMM

Computer Speech and Language
Parsimonious translation models for information retrieval

Information Processing and Management: an International Journal
Extending query translation to cross-language query expansion with markov chain models

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Review article: A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management

Computers in Industry
Comparing different units for query translation in Chinese cross-language information retrieval

Proceedings of the 2nd international conference on Scalable information systems
Integrating Cross-Language Hierarchies and Its Application to Retrieving Relevant Documents

ACM Transactions on Asian Language Information Processing (TALIP)
Data driven methods for improving mono- and cross-lingual IR performance in noisy environments

Proceedings of the second workshop on Analytics for noisy unstructured text data
A statistical approach to crosslingual natural language tasks

Journal of Algorithms
An automatic translation of tags for multimedia contents using folksonomy networks

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
MINT: a method for effective and scalable mining of named entity transliterations from large comparable corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
On-Demand Associative Cross-Language Information Retrieval

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Query translation disambiguation as graph partitioning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Mining a comparable text corpus for a Vietnamese - French statistical machine translation system

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Exploiting query logs for cross-lingual query suggestions

ACM Transactions on Information Systems (TOIS)
Using query-relevant documents pairs for cross-lingual information retrieval

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
WikiTranslate: query translation for cross-lingual information retrieval using only Wikipedia

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Enriching document representation via translation for improved monolingual information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A Survey of Automatic Query Expansion in Information Retrieval

ACM Computing Surveys (CSUR)
Automatic identification of parallel documents with light or without linguistic resources

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Report on thomson legal and regulatory experiments at CLEF-2004

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Automatic acquisition of chinese–english parallel corpus from the web

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Matching meaning for cross-language information retrieval

Information Processing and Management: an International Journal
An information-based cross-language information retrieval model

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)
Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
A penalisation-based ranking approach for the mixed monolingual task of WebCLEF 2006

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Amharic-English bilingual web search engine

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Flat vs. hierarchical phrase-based translation models for cross-language information retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.