Dictionary matching and indexing with errors and don't cares

Authors:
Richard Cole;Lee-Ad Gottlieb;Moshe Lewenstein
Affiliations:
New York University, NY, NY;New York University, NY, NY;Bar-Ilan University, Ramat Gan, Israel
Venue:
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Year:
2004

Citing 30
Cited 82

Fast algorithms for finding nearest common ancestors

SIAM Journal on Computing
Improved string matching with k mismatches

ACM SIGACT News
Efficient string matching with k mismatches

Theoretical Computer Science
Generalized string matching

SIAM Journal on Computing
On finding lowest common ancestors: simplification and parallelization

SIAM Journal on Computing
Fast parallel and serial approximate string matching

Journal of Algorithms
Adaptive dictionary matching

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
An algorithm for approximate membership checking with application to password security

Information Processing Letters
Dynamic dictionary matching

Journal of Computer and System Sciences
Improved dynamic dictionary matching

Information and Computation
Approximate string matching with don't care characters

Information Processing Letters
Dictionary look-up with one error

Journal of Algorithms
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Multi-method dispatching: a geometric approach with applications to string matching problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Approximate string matching: a simpler faster algorithm

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Faster algorithms for string matching with k mismatches

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Improved bounds for dictionary look-up with one error

Information Processing Letters
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Text indexing and dictionary matching with one error

Journal of Algorithms
Approximate subset matching with Don't Cares

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Verifying candidate matches in sparse and wildcard matching

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Deterministic dictionaries

Journal of Algorithms
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces

SIAM Journal on Computing
Spatial Databases-Accomplishments and Research Needs

IEEE Transactions on Knowledge and Data Engineering
Approximate Dictionary Queries

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Dynamic Dictionary Matching with Failure Functions (Extended Abstract)

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Efficient approximate and dynamic matching of patterns using a labeling paradigm

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science

Atomic Wedgie: Efficient Query Filtering for Streaming Times Series

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Efficient algorithms for substring near neighbor problem

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Pattern matching with address errors: rearrangement distances

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A new method for approximate indexing and dictionarylookup with one error

Information Processing Letters
An efficient DNA sequence searching method using position specific weighting scheme

Journal of Information Science
Approximate string matching using compressed suffix arrays

Theoretical Computer Science
A metric index for approximate string matching

Theoretical Computer Science
A dictionary for approximate string search and longest prefix search

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Motif discovery by monotone scores

Discrete Applied Mathematics
Efficient query filtering for streaming time series with applications to semisupervised learning of time series classifiers

Knowledge and Information Systems
Algorithms for extracting motifs from biological weighted sequences

Journal of Discrete Algorithms
Compressed indexes for approximate string matching

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Languages with mismatches

Theoretical Computer Science
Text indexing with errors

Journal of Discrete Algorithms
Disorder inequality: a combinatorial approach to nearest neighbor search

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Property matching and weighted matching

Theoretical Computer Science
Optimal prefix and suffix queries on texts

Information Processing Letters
Note: k-difference matching in amortized linear time for all the words in a text

Theoretical Computer Science
Binding Structural Properties to Node and Path Constraints in XML Path Retrieval

Advanced Internet Based Systems and Applications
Pattern matching with address errors: Rearrangement distances

Journal of Computer and System Sciences
Faster and Space-Optimal Edit Distance "1" Dictionary

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Engineering a compressed suffix tree implementation

Journal of Experimental Algorithmics (JEA)
Efficient computations of l1 and l∞ rearrangement distances

Theoretical Computer Science
Succinct Text Indexing with Wildcards

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
From Nerode's congruence to suffix automata with mismatches

Theoretical Computer Science
Combinatorial Framework for Similarity Search

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Pattern matching with don't cares and few errors

Journal of Computer and System Sciences
Pattern matching with wildcards based on key character location

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
A Comparative Study of Pattern Matching Algorithms on Sequences

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
BPBM: An Algorithm for String Matching with Wildcards and Length Constraints

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
A new method for approximate indexing and dictionary lookup with one error

Information Processing Letters
On the suffix automaton with mismatches

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Efficient computations of l1and l∞rearrangement distances

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Indexing a dictionary for subset matching queries

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Approximate string matching with Lempel-Ziv compressed indexes

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Space efficient indexes for string matching with don't cares

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
String matching with up to k swaps and mismatches

Information and Computation
A filtering algorithm for k-mismatch with don't cares

Information Processing Letters
The property suffix tree with dynamic properties

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Fast index for approximate string matching

Journal of Discrete Algorithms
Finding Patterns In Given Intervals

Fundamenta Informaticae
On building minimal automaton for subset matching queries

Information Processing Letters
Improved fast similarity search in dictionaries

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Indexing methods for approximate dictionary searching: Comparative analysis

Journal of Experimental Algorithmics (JEA)
Approximate String Processing

Foundations and Trends in Databases
Cache-oblivious index for approximate string matching

Theoretical Computer Science
Online dictionary matching with variable-length gaps

SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Succincter text indexing with wildcards

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Indexing with gaps

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Compressed text indexing with wildcards

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
A linear size index for approximate pattern matching

Journal of Discrete Algorithms
Unifying the Landscape of Cell-Probe Lower Bounds

SIAM Journal on Computing
Enhancing trie-based syntactic pattern recognition using AI heuristic search strategies

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Probabilistic management of OCR data using an RDBMS

Proceedings of the VLDB Endowment
Asynchronous pattern matching

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
A linear size index for approximate pattern matching

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Dotted suffix trees a structure for approximate text indexing

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Improved approximate string matching using compressed suffix data structures

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Languages with mismatches and an application to approximate indexing

DLT'05 Proceedings of the 9th international conference on Developments in Language Theory
Text indexing with errors

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Gapped spectral dictionaries and their applications for database searches of tandem mass spectra

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Approximate all-pairs suffix/prefix overlaps

Information and Computation
Indexing a dictionary for subset matching queries

Algorithms and Applications
Unified view of backward backtracking in short read mapping

Algorithms and Applications
Efficient approximate dictionary look-up for long words over small alphabets

LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
Least random suffix/prefix matches in output-sensitive time

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
String indexing for patterns with wildcards

SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory
Super-Linear indices for approximate dictionary searching

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Maximal intersection queries in randomized graph models

CSR'07 Proceedings of the Second international conference on Computer Science: theory and applications
Finding patterns in given intervals

MFCS'07 Proceedings of the 32nd international conference on Mathematical Foundations of Computer Science
Cache-oblivious index for approximate string matching

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Flexible and efficient string similarity search with alignment-space transform

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Compressed text indexing with wildcards

Journal of Discrete Algorithms
Approximate string matching by position restricted alignment

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Efficient fuzzy search in large text collections

ACM Transactions on Information Systems (TOIS)
Efficient top-k algorithms for approximate substring matching

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Compressed indexes for text with wildcards

Theoretical Computer Science
HmSearch: an efficient hamming distance query processing algorithm

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph

Applied Intelligence
Compressed persistent index for efficient rank/select queries

WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
RCSI: scalable similarity search in thousand(s) of genomes

Proceedings of the VLDB Endowment
Efficient error-tolerant query autocompletion

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of "don't care" characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an all-to-all matching problem, for which we provide a single solution.The performance bounds all have a similar character. For example, for the indexing problem with n=|t| and m=|p|, the query time for k substitutions is O(m + (c1 log n)k⁄k! + # matches), with a data structure of size O(n (c2 log n)k⁄k!) and a preprocessing time of O(n (c2 log n)k⁄k!), where c1,c2 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.