Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Efficient 2-dimensional approximate matching of half-rectangular figures
Information and Computation
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Pattern matching for sets of segments
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Finding motifs using random projections
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Provably sensitive Indexing strategies for biosequence similarity search
Proceedings of the sixth annual international conference on Computational biology
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Identifying Representative Trends in Massive Time Series Data Sets Using Sketches
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Simple and Practical Sequence Nearest Neighbors with Block Operations
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Optimal suffix tree construction with large alphabets
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Faster Algorithms for String Matching Problems: Matching the Convolution Bound
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
A dictionary for approximate string search and longest prefix search
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Nearest neighbor search methods for handshape recognition
Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments
Overcoming the l1 non-embeddability barrier: algorithms for product metrics
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Query expansion for hash-based image object retrieval
MM '09 Proceedings of the 17th ACM international conference on Multimedia
HARRA: fast iterative hashed record linkage for large-scale data collections
Proceedings of the 13th International Conference on Extending Database Technology
Fingerprints in compressed strings
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing
Proceedings of the VLDB Endowment
Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny)
ACM Transactions on Computation Theory (TOCT)
Hi-index | 0.00 |
In this paper we consider the problem of finding the approximate nearest neighbor when the data set points are the substrings of a given text T. Specifically, for a string T of length n, we present a data structure which does the following: given a pattern P, if there is a substring of T within the distance R from P, it reports a (possibly different) substring of T within distance cR from P. The length of the pattern P, denoted by m, is not known in advance. For the case where the distances are measured using the Hamming distance, we present a data structure which uses Õ(n1+1/c) space1 and with Õ(n1/c + mno(1)) query time. This essentially matches the earlier bounds of [Ind98], which assumed that the pattern length m is fixed in advance. In addition, our data structure can be constructed in time Õ(n1+1/c + n1+o(1)M1/3), where M is an upper bound for m. This essentially matches the preprocessing bound of [Ind98] as long as the term Õ(n1+1/c) dominates the running time, which is the case when, e.g., c l1 distance. The query time and the space bound are essentially the same, while the preprocessing time becomes Õ(n1+1/c + n1+o(1)M2/3).