Efficient string matching with k mismatches
Theoretical Computer Science
A tale of three spelling checkers
Software—Practice & Experience
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Tries for Approximate String Matching
IEEE Transactions on Knowledge and Data Engineering
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Approximating Edit Distance Efficiently
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Low distortion embeddings for edit distance
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Efficient algorithms for substring near neighbor problem
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Indexing methods for approximate dictionary searching: Comparative analysis
Journal of Experimental Algorithmics (JEA)
The smoothed complexity of edit distance
ACM Transactions on Algorithms (TALG)
Hi-index | 0.00 |
In this paper we propose a dictionary data structure for string search with errors where the query string may didiffer from the expected matching string by a few edits. This data structure can also be used to find the database string with the longest common prefix with few errors. Specifically, with a database of n random strings, each of length of O(m), we show how to perform string search on a query string that differs from its closest match by k edits using a data structure of linear size and query time equal to Õ(log n 2 log n klog a 2m over 2m). This means that if k m over log a 2m log n, then the query time is Õ(1). This is of significant in practice as there are several applications where k is small relative to m. Our approach converts strings into bit vectors so that similar strings can map to similar bit vectors with small hamming distance. A simple reduction can be used to obtain similar results for approximate longest prefix search.