Translation corpus source and size in bilingual retrieval

Authors:
Paul McNamee;James Mayfield;Charles Nicholas
Affiliations:
Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD;UMBC, Baltimore, MD
Venue:
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Year:
2009

Citing 10
Cited 1

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Applying query structuring in cross-language retrieval

Information Processing and Management: an International Journal
The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval

HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 4 - Volume 4
Cross-Language Evaluation Forum: Objectives, Results, Achievements

Information Retrieval
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Translating pieces of words

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual information retrieval using hidden Markov models

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Combining bidirectional translation and synonymy for cross-language information retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of the bible as a resource for cross-language information retrieval

MLRI '06 Proceedings of the Workshop on Multilingual Language Resources and Interoperability

Matching meaning for cross-language information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores corpus-based bilingual retrieval where the translation corpora used vary by source and size. We find that the quality of translation alignments and the domain of the bitext are important. In some settings these factors are more critical than corpus size. We also show that judicious choice of tokenization can reduce the amount of bitext required to obtain good bilingual retrieval performance.