Information Processing and Management: an International Journal
Effective text compression with simultaneous digram and trigram encoding
Journal of Information Science
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Results of applying probabilistic IR to OCR text
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Finding approximate matches in large lexicons
Software—Practice & Experience
Phonetic string matching: lessons from information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Using n-grams for Korean text retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing representations in Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Interaction in information retrieval: selection and effectiveness of search terms
Journal of the American Society for Information Science
The future of Internet search (keynote address)
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A Winnow-Based Approach to Context-Sensitive Spelling Correction
Machine Learning - Special issue on natural language learning
A Study of Methods for Systematically Abbreviating English Words and Names
Journal of the ACM (JACM)
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
On the use of words and n-grams for Chinese information retrieval
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Experiments in spoken document retrieval using phoneme n-grams
Speech Communication - Special issue on accessing information in spoken audio
Syntax-directed least-errors analysis for context-free languages: a practical approach
Communications of the ACM
A technique for computer detection and correction of spelling errors
Communications of the ACM
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Improved string matching under noisy channel conditions
Proceedings of the tenth international conference on Information and knowledge management
Term selection for searching printed Arabic
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
Exploiting syntactic analysis of queries for information retrieval
Data & Knowledge Engineering
Automatic Rule Acquisition for Spelling Correction
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Common Solution for Tokenization and Part-of-Speech Tagging
TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
CIAA '01 Revised Papers from the 6th International Conference on Implementation and Application of Automata
Cross-language information retrieval: experiments based on CLEF 2000 corpora
Information Processing and Management: an International Journal
A systematic comparison of various statistical alignment models
Computational Linguistics
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Monolingual Document Retrieval for European Languages
Information Retrieval
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Chinese word segmentation and its effect on information retrieval
Information Processing and Management: an International Journal
Towards a single proposal in spelling correction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A spelling correction program based on a noisy channel model
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Using N-grams for Arabic text searching
Journal of the American Society for Information Science and Technology
Correcting real-word spelling errors by restoring lexical cohesion
Natural Language Engineering
Using contextual spelling correction to improve retrieval effectiveness in degraded text collections
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Pronunciation modeling for improved spelling correction
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Character contiguity in N-gram-based word matching: the case for Arabic text searching
Information Processing and Management: an International Journal
Fast Approximate Search in Large Dictionaries
Computational Linguistics
Spelling correction in the PubMed search engine
Information Retrieval
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Exploring distributional similarity based models for query spelling correction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Text induced spelling correction
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Effect of OCR error correction on Arabic retrieval
Information Retrieval
A unified and discriminative model for query refinement
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Data driven methods for improving mono- and cross-lingual IR performance in noisy environments
Proceedings of the second workshop on Analytics for noisy unstructured text data
An empirical study of gene synonym query expansion in biomedical information retrieval
Information Retrieval
Evaluation of query expansion using MeSH in PubMed
Information Retrieval
Exploring criteria for successful query expansion in the genomic domain
Information Retrieval
n-Gram characterization of genomic islands in bacterial genomes
Computer Methods and Programs in Biomedicine
Analysis of long queries in a large scale search log
Proceedings of the 2009 workshop on Web Search Click Data
Search Engines: Information Retrieval in Practice
Search Engines: Information Retrieval in Practice
Ordering the suggestions of a spellchecker without using context*
Natural Language Engineering
Fast error-tolerant search on very large texts
Proceedings of the 2009 ACM symposium on Applied Computing
Textual representations for corpus-based bilingual retrieval
Textual representations for corpus-based bilingual retrieval
Addressing morphological variation in alphabetic languages
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The linguistic structure of English web-search queries
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Japanese query alteration based on semantic similarity
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Effective spelling correction in web queries and run-time DB construction
Proceedings of the 2009 International Conference on Hybrid Information Technology
Analyzing and evaluating query reformulation strategies in web search logs
Proceedings of the 18th ACM conference on Information and knowledge management
Mining linguistic cues for query expansion: applications to drug interaction search
Proceedings of the 18th ACM conference on Information and knowledge management
Discovery of term variation in Japanese web search queries
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Information Processing and Management: an International Journal
Contextual spelling correction
EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory
A comparison of language identification approaches on short, query-style texts
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Our work concerns the design of robust information retrieval environments that can successfully handle queries containing misspelled words. Our aim is to perform a comparative analysis of the efficacy of two possible strategies that can be adopted. A first strategy involves those approaches based on correcting the misspelled query, thus requiring the integration of linguistic information in the system. This solution has been studied from complementary standpoints, according to whether contextual information of a linguistic nature is integrated in the process or not, the former implying a higher degree of complexity. A second strategy involves the use of character n-grams as the basic indexing unit, which guarantees the robustness of the information retrieval process whilst at the same time eliminating the need for a specific query correction stage. This is a knowledge-light and language-independent solution which requires no linguistic information for its application. Both strategies have been subjected to experimental testing, with Spanish being used as the case in point. This is a language which, unlike English, has a great variety of morphological processes, making it particularly sensitive to spelling errors. The results obtained demonstrate that stemming-based approaches are highly sensitive to misspelled queries, particularly with short queries. However, such a negative impact can be effectively reduced by the use of correction mechanisms during querying, particularly in the case of context-based correction, since more classical approaches introduce too much noise when query length is increased. On the other hand, our n-gram based strategy shows a remarkable robustness, with average performance losses appreciably smaller than those for stemming.