Algorithms for approximate string matching
Information and Control
An algorithm for approximate membership checking with application to password security
Information Processing Letters
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Dictionary look-up with one error
Journal of Algorithms
Neighborhood preserving hashing and approximate queries
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Efficient Storage and Retrieval by Content and Address of Static Files
Journal of the ACM (JACM)
Improved bounds for dictionary look-up with one error
Information Processing Letters
Tries for Approximate String Matching
IEEE Transactions on Knowledge and Data Engineering
Approximate Dictionary Queries
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Fast motif search in protein sequence databases
CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Hi-index | 0.00 |
Given a dictionary ${\mathcal W}$ consisting of n binary strings of length m each, a d-query asks if there exists a string in ${\mathcal W}$ within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 as a challenge to data structure design. There is a tradeoff between time and space in solving the problem of answering a d-query. Recently developed time-efficient methods for text indexing with errors can be used to answer a d-query in O(m) time. However, these methods use O(nlogdn) (or more) additional space which is not practical for large databases. We present a method for the problem assuming the standard RAM model of computation. We process the dictionary to construct an edge-labelled tree with distinct labels to siblings, and with bounded branching factor and height. Storing the resulting tree does not require asymptotically more space than the size of an ordinary trie that stores the given dictionary. We present an algorithm for the d-query problem that takes O(m(3 log4/3n – 1)d (log2n)d+1) time, and uses only O(m) additional space. We also generalize the results for the case of the problem when a larger alphabet, or edit distance are used. We achieve $O(m(2|\Sigma|-1)^{d}(log_{(2|\Sigma|-1)}{\it n} -1) ^{d}(log_{2}n)^{d+1})$ time complexity for the problem when Hamming distance is used. The time complexity increases by a factor of $O(d(2|\Sigma|-1)^d(log_{2}n)^{d})$ when we use edit distance. The algorithms are efficient when the approximate dictionary look-up involves long words defined over small alphabets. The algorithm can be modified such that it allows for words of different lengths as well as different lengths of query strings.