Theoretical Computer Science
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Towards an analysis of range query performance in spatial data structures
PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
CIKM '93 Proceedings of the second international conference on Information and knowledge management
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A model for the prediction of R-tree performance
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Multidimensional access methods
ACM Computing Surveys (CSUR)
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Block addressing indices for approximate text retrieval
Journal of the American Society for Information Science - Special topic issue: When museum informatics meets the World Wide Web
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Combinatorial Algorithms on Words
Combinatorial Algorithms on Words
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Probabilistic Analysis of Generalized Suffix Trees (Extended Abstract)
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Approximate String-Matching over Suffix Trees
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Filtration with q-Samples in Approximate String Matching
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
A Metric Index for Approximate String Matching
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A compact space decomposition for effective metric indexing
Pattern Recognition Letters
A new method for approximate indexing and dictionarylookup with one error
Information Processing Letters
Linear-time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Testing embeddability between metric spaces
CATS '08 Proceedings of the fourteenth symposium on Computing: the Australasian theory - Volume 77
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part II
Fast index for approximate string matching
Journal of Discrete Algorithms
Hi-index | 5.23 |
We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us finding the occ occurrences of a pattern of length m, permitting up to r differences, in a text of length n over an alphabet of size σ, in average time O(m1+ε + occ) for any ε 0, if r = o(m/logσm) and m ((1 + ε)/ε)logσ n. The index works well up to r m/logσm, where it achieves its maximum average search complexity O(m1+√2+ε + occ). The construction time of the index is O(m1+√2+ε nlog n) and its space is O(m1+√2+εn). This is the first index achieving average search time polynomial in m and independent of n, for r = O(m/logσm). Previous methods achieve this complexity only for r = O(m/logσ n). We also present a simpler scheme needing O(n) space.