Corrupted queries in Spanish text retrieval: error correction vs. N-Grams

Authors:
Juan Otero;Jesús Vilares;Manuel Vilares Ferro
Affiliations:
University of Vigo, Ourense, Spain;University of A Coruña, A Coruña, Spain;University of Vigo, Ourense, Spain
Venue:
Proceedings of the 2nd ACM workshop on Improving non english web searching
Year:
2008

Citing 14
Cited 1

Results of applying probabilistic IR to OCR text

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction

Computational Linguistics
A technique for computer detection and correction of spelling errors

Communications of the ACM
Improved string matching under noisy channel conditions

Proceedings of the tenth international conference on Information and knowledge management
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
Exploiting syntactic analysis of queries for information retrieval

Data & Knowledge Engineering
A Common Solution for Tokenization and Part-of-Speech Tagging

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Typographical Nearest-Neighbor Search in a Finite-State Lexicon and Its Application to Spelling Correction

CIAA '01 Revised Papers from the 6th International Conference on Implementation and Application of Automata
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
A spelling correction program based on a noisy channel model

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Using contextual spelling correction to improve retrieval effectiveness in degraded text collections

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Pronunciation modeling for improved spelling correction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Contextual spelling correction

EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory

Current research issues and trends in non-English Web searching

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose and evaluate two different alternatives to deal with degraded queries on Spanish IR applications. The first one is an n-gram-based strategy which has no dependence on the degree of available linguistic knowledge. On the other hand, we propose two spelling correction techniques, one of which has a strong dependence on a stochastic model that must be previously built from a POS-tagged corpus. In order to study their validity, a testing framework has been formally designed and applied on both approaches.