A first approach to CLIR using character n-grams alignment

Authors:
Jesús Vilares;Michael P. Oakes;John I. Tait
Affiliations:
Departamento de Computación, Universidade da Coruña, A Coruña, Spain;School of Computing and Technology, University of Sunderland, Sunderland, United Kingdom;School of Computing and Technology, University of Sunderland, Sunderland, United Kingdom
Venue:
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Year:
2006

Citing 6
Cited 2

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
Cross-language information retrieval: experiments based on CLEF 2000 corpora

Information Processing and Management: an International Journal
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

Computational Linguistics

Mixed monolingual homepage finding in 34 languages: the role of language script and search domain

Information Retrieval
Does dictionary based bilingual retrieval work in a non-normalized index?

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the technique for translation of character n-grams we developed for our participation in CLEF 2006. This solution avoids the need for word normalization during indexing or translation, and it can also deal with out-of-vocabulary words. Since it does not rely on language-specific processing, it can be applied to very different languages, even when linguistic information and resources are scarce or unavailable. Our proposal makes considerable use of freely available resources and also tries to achieve a higher speed during the n-gram alignment process with respect to other similar approaches.