Character N-grams translation in cross-language information retrieval

  • Authors:
  • Jesús Vilares;Michael P. Oakes;Manuel Vilares

  • Affiliations:
  • Department of Computer Science, University of A Coruña, A Coruña, Spain;School of Computing and Technology, University of Sunderland, Sunderland, United Kingdom;Department of Computer Science, University of Vigo, Ourense, Spain

  • Venue:
  • NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a new technique for the direct translation of character n-grams for use in Cross-Language Information Retrieval systems. This solution avoids the need for word normalization during indexing or translation, and it can also deal with out-of-vocabulary words. This knowledge-light approach does not rely on language-specific processing, and it can be used with languages of very different natures even when linguistic information and resources are scarce or unavailable. Our proposal also tries to achieve a higher speed during the n-gram alignment process with respect to previous approaches.