The use of monolingual context vectors for missing translations in cross-language information retrieval

Authors:
Yan Qu;Gregory Grefenstette;David A. Evans
Affiliations:
Clairvoyance Corporation, Pittsburgh, PA;LIC2M/SCRI/LIST/DTSI/CEA, Fontenay-aux-Roses Cedex, France;Clairvoyance Corporation, Pittsburgh, PA
Venue:
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Year:
2005

Citing 15
Cited 0

CLARIT-TREC experiments

TREC-2 Proceedings of the second conference on Text retrieval conference
Querying across languages: a dictionary-based approach to multilingual information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A context vector model for information retrieval

Journal of the American Society for Information Science and Technology
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Dictionary Methods for Cross-Lingual Information Retrieval

DEXA '96 Proceedings of the 7th International Conference on Database and Expert Systems Applications
Applying query structuring in cross-language retrieval

Information Processing and Management: an International Journal
Automatic transliteration for Japanese-to-English text retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Machine transliteration

Computational Linguistics
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Building an MT dictionary from parallel texts based on linguistic and statistical information

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Using the web for automated translation extraction in cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Mining the Web to Create a Language Model for Mapping between English Names and Phrases and Japanese

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Towards effective strategies for monolingual and bilingual information retrieval: Lessons learned from NTCIR-4

ACM Transactions on Asian Language Information Processing (TALIP)
Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

For cross-language text retrieval systems that rely on bilingual dictionaries for bridging the language gap between the source query language and the target document language, good bilingual dictionary coverage is imperative. For terms with missing translations, most systems employ some approaches for expanding the existing translation dictionaries. In this paper, instead of lexicon expansion, we explore whether using the context of the unknown terms can help mitigate the loss of meaning due to missing translation. Our approaches consist of two steps: (1) to identify terms that are closely associated with the unknown source language terms as context vectors and (2) to use the translations of the associated terms in the context vectors as the surrogate translations of the unknown terms. We describe a query-independent version and a query-dependent version using such monolingual context vectors. These methods are evaluated in Japanese-to-English retrieval using the NTCIR-3 topics and data sets. Empirical results show that both methods improved CLIR performance for short and medium-length queries and that the query-dependent context vectors performed better than the query-independent versions.