Using contextual spelling correction to improve retrieval effectiveness in degraded text collections

Authors:
Patrick Ruch
Affiliations:
Swiss Federal Institute of Technology, Lausanne - Switzerland
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 18
Cited 8

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Context based spelling correction

Information Processing and Management: an International Journal
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Results of applying probabilistic IR to OCR text

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
The feature quantity: an information theoretic perspective of Tfidf-like measures

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Computer programs for detecting and correcting spelling errors

Communications of the ACM
A technique for computer detection and correction of spelling errors

Communications of the ACM
Evaluation of DEFINDER: a system to mine definitions from consumer-oriented medical text

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Length Normalization in Degraded Text Collections

Length Normalization in Degraded Text Collections
Towards a single proposal in spelling correction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic thesaurus generation through multiple filtering

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Minimal commitment and full lexical disambiguation: balancing rules and hidden Markov Models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7

Query translation by text categorization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Argumentative feedback: a linguistically-motivated term expansion for information retrieval

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Text Retrieval through Corrupted Queries

IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Corrupted queries in Spanish text retrieval: error correction vs. N-Grams

Proceedings of the 2nd ACM workshop on Improving non english web searching
Using argumentation to retrieve articles with similar citations from MEDLINE

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Managing misspelled queries in IR applications

Information Processing and Management: an International Journal
Latent argumentative pruning for compact MEDLINE indexing

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

The study presented relies on the design and evaluation of an improved IR system susceptible to cope with textual misspellings. After selecting an optimal weighting scheme for the engine, we evaluate the effect of misspellings on the retrieval effectiveness. Then, we compare the improvement brought to the engine by the adjunction of two different non-interactive spelling correction strategies: a classical one, based on a string-to-string edit distance calculus, and a contextual one, which adds linguistically-motivated features to the string distance module. The results for the latter suggest that average precision in degraded texts can be reduced to a few percents (4%).