Improved fast similarity search in dictionaries

Authors:
Daniel Karch;Dennis Luxen;Peter Sanders
Affiliations:
Karlsruhe Institute of Technology;Karlsruhe Institute of Technology;Karlsruhe Institute of Technology
Venue:
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Year:
2010

Citing 7
Cited 2

A hash code method for detecting and correcting spelling errors

Communications of the ACM
Some approaches to best-match file searching

Communications of the ACM
Searching in metric spaces

ACM Computing Surveys (CSUR)
Better filtering with gapped q-grams

Fundamenta Informaticae - Special issue on computing patterns in strings
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Fast Approximate Search in Large Dictionaries

Computational Linguistics
Increased bit-parallelism for approximate and multiple string matching

Journal of Experimental Algorithmics (JEA)

Engineering efficient error-correcting geocoding

Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Flexible and efficient string similarity search with alignment-space transform

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We engineer an algorithm to solve the approximate dictionary matching problem. Given a list of words W, maximum distance d fixed at preprocessing time and a query word q, we would like to retrieve all words from W that can be transformed into q with d or less edit operations. We present data structures that support fault tolerant queries by generating an index. On top of that, we present a generalization of the method that eases memory consumption and preprocessing time significantly. At the same time, running times of queries are virtually unaffected. We are able to match in lists of hundreds of thousands of words and beyond within microseconds for reasonable distances.