Character N-grams translation in cross-language information retrieval

Authors:
Jesús Vilares;Michael P. Oakes;Manuel Vilares
Affiliations:
Department of Computer Science, University of A Coruña, A Coruña, Spain;School of Computing and Technology, University of Sunderland, Sunderland, United Kingdom;Department of Computer Science, University of Vigo, Ourense, Spain
Venue:
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Year:
2007

Citing 5
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
Cross-language information retrieval: experiments based on CLEF 2000 corpora

Information Processing and Management: an International Journal
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a new technique for the direct translation of character n-grams for use in Cross-Language Information Retrieval systems. This solution avoids the need for word normalization during indexing or translation, and it can also deal with out-of-vocabulary words. This knowledge-light approach does not rely on language-specific processing, and it can be used with languages of very different natures even when linguistic information and resources are scarce or unavailable. Our proposal also tries to achieve a higher speed during the n-gram alignment process with respect to previous approaches.