A dictionary for approximate string search and longest prefix search

Authors:
Sreenivas Gollapudi;Rina Panigrahy
Affiliations:
Microsoft Search Labs;Stanford University
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 15
Cited 2

Efficient string matching with k mismatches

Theoretical Computer Science
A tale of three spelling checkers

Software—Practice & Experience
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Min-wise independent permutations

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Tries for Approximate String Matching

IEEE Transactions on Knowledge and Data Engineering
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice

ACM Transactions on Computer Systems (TOCS)
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Approximating Edit Distance Efficiently

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Low distortion embeddings for edit distance

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Efficient algorithms for substring near neighbor problem

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm

Indexing methods for approximate dictionary searching: Comparative analysis

Journal of Experimental Algorithmics (JEA)
The smoothed complexity of edit distance

ACM Transactions on Algorithms (TALG)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a dictionary data structure for string search with errors where the query string may didiffer from the expected matching string by a few edits. This data structure can also be used to find the database string with the longest common prefix with few errors. Specifically, with a database of n random strings, each of length of O(m), we show how to perform string search on a query string that differs from its closest match by k edits using a data structure of linear size and query time equal to Õ(log n 2 log n klog a 2m over 2m). This means that if k m over log a 2m log n, then the query time is Õ(1). This is of significant in practice as there are several applications where k is small relative to m. Our approach converts strings into bit vectors so that similar strings can map to similar bit vectors with small hamming distance. A simple reduction can be used to obtain similar results for approximate longest prefix search.