Corrupted queries in Spanish text retrieval: error correction vs. N-Grams

  • Authors:
  • Juan Otero;Jesús Vilares;Manuel Vilares Ferro

  • Affiliations:
  • University of Vigo, Ourense, Spain;University of A Coruña, A Coruña, Spain;University of Vigo, Ourense, Spain

  • Venue:
  • Proceedings of the 2nd ACM workshop on Improving non english web searching
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose and evaluate two different alternatives to deal with degraded queries on Spanish IR applications. The first one is an n-gram-based strategy which has no dependence on the degree of available linguistic knowledge. On the other hand, we propose two spelling correction techniques, one of which has a strong dependence on a stochastic model that must be previously built from a POS-tagged corpus. In order to study their validity, a testing framework has been formally designed and applied on both approaches.