Fast text searching: allowing errors
Communications of the ACM
IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Retrieval
The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text
Information Retrieval
A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming
CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Text Retrieval through Corrupted Queries
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Corrupted queries in Spanish text retrieval: error correction vs. N-Grams
Proceedings of the 2nd ACM workshop on Improving non english web searching
Managing misspelled queries in IR applications
Information Processing and Management: an International Journal
Using string comparison in context for improved relevance feedback in different text media
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
Many document-based applications, including popular Web browsers, email viewers, and word processors, have a 'Find on this Page' feature that allows a user to find every occurrence of a given string in the document. If the document text being searched is derived from a noisy process such as optical character recognition (OCR), the effectiveness of typical string matching can be greatly reduced. This paper describes an enhanced string-matching algorithm for degraded text that improves recall, while keeping precision at acceptable levels. The algorithm is more general than most approximate matching algorithms and allows string-to-string edits with arbitrary costs. We develop a method for evaluating our technique and use it to examine the relative effectiveness of each sub-component of the algorithm. Of the components we varied, we find that using confidence information from the recognition process lead to the largest improvements in matching accuracy.