Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Dictionary organizations for efficient similarity retrieval
Journal of Systems and Software
ACM Computing Surveys (CSUR)
A hash code method for detecting and correcting spelling errors
Communications of the ACM
A technique for computer detection and correction of spelling errors
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Approximate String-Matching over Suffix Trees
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Fast Approximate Search in Large Dictionaries
Computational Linguistics
Contextual Postprocessing System for Cooperation with a Multiple-Choice Character-Recognition System
IEEE Transactions on Computers
Faster and Space-Optimal Edit Distance "1" Dictionary
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Directly Addressable Variable-Length Codes
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Indexing methods for approximate dictionary searching: Comparative analysis
Journal of Experimental Algorithmics (JEA)
Hi-index | 0.00 |
We present experimental analysis of approximate search algorithms that involve indexing of deletion neighborhoods. These methods require huge indices whose sizes grow exponentially with respect to the maximum allowable number of errors k. Despite extraordinary space requirements, the super-linear indices are of great interest, because they provide some of the shortest retrieval times. A straightforward implementation that creates a hash index directly over residual strings (obtained by deletions from dictionary words) is not space efficient. Rather than memorizing complete residual strings, we record only deleted characters and their respective positions. These data are indexed using a perfect hash function computed for a set of residual dictionary strings [2]. We carry out an experimental evaluation of this approach against several well-known benchmarks (including FastSS, which stores residual strings directly [3]). Experiments show that our implementation has a comparable or superior performance to that of the fastest benchmarks. At the same time, our implementation requires 4-8 times less space as compared to FastSS.