Evaluation of the bible as a resource for cross-language information retrieval

Authors:
Peter A. Chew;Steve J. Verzi;Travis L. Bauer;Jonathan T. McClain
Affiliations:
Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM
Venue:
MLRI '06 Proceedings of the Workshop on Multilingual Language Resources and Interoperability
Year:
2006

Citing 5
Cited 6

Modern Information Retrieval

Modern Information Retrieval
Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Language Resources in Cross-Language Text Retrieval: A CLEF Perspective

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics

Cross-language information retrieval using PARAFAC2

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Translation corpus source and size in bilingual retrieval

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
A semantic feature for statistical machine translation

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
A vector-space dynamic feature for phrase-based statistical machine translation

Journal of Intelligent Information Systems
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)
Evaluating indirect strategies for Chinese-Spanish statistical machine translation

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

An area of recent interest in cross-language information retrieval (CLIR) is the question of which parallel corpora might be best suited to tasks in CLIR, or even to what extent parallel corpora can be obtained or are necessary. One proposal, which in our opinion has been somewhat overlooked, is that the Bible holds a unique value as a multilingual corpus, being (among other things) widely available in a broad range of languages and having a high coverage of modern-day vocabulary. In this paper, we test empirically whether this claim is justified through a series of validation tests on various information retrieval tasks. Our results appear to indicate that our methodology may significantly outperform others recently proposed.