Algorithms for approximate string matching
Information and Control
Fast approximate string matching
Software—Practice & Experience
Fast text searching: allowing errors
Communications of the ACM
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
An algorithm for approximate membership checking with application to password security
Information Processing Letters
String searching algorithms
Dictionary look-up with one error
Journal of Algorithms
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Efficient algorithms for approximate string matching with swaps
Journal of Complexity
The String-to-String Correction Problem
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
The string-to-string correction problem with block moves
ACM Transactions on Computer Systems (TOCS)
Improved bounds for dictionary look-up with one error
Information Processing Letters
Implementation of the substring test by hashing
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Modern Information Retrieval
Managing Gigabytes: Compressing and Indexing Documents and Images
Managing Gigabytes: Compressing and Indexing Documents and Images
Introduction to Algorithms
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Pattern Matching with Samples
ISAAC '94 Proceedings of the 5th International Symposium on Algorithms and Computation
Dictionary Look-Up within Small Edit Distance
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Approximate String-Matching over Suffix Trees
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
A Fast Filtration Algorithm for the Substring Matching Problem
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Exact and Approximation Algorithms for the Inversion Distance Between Two Chromosomes
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Approximate Dictionary Queries
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
On Using q-Gram Locations in Approximate String Matching
ESA '95 Proceedings of the Third Annual European Symposium on Algorithms
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TAILOR: A Record Linkage Tool Box
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Approximating Edit Distance Efficiently
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Perceptrons: An Introduction to Computational Geometry
Perceptrons: An Introduction to Computational Geometry
A Primitive Operator for Similarity Joins in Data Cleaning
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Data integration: the teenage years
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The string edit distance matching problem with moves
ACM Transactions on Algorithms (TALG)
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
Leveraging aggregate constraints for deduplication
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Compressed indexes for approximate string matching
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Low distortion embeddings for edit distance
Journal of the ACM (JACM)
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient online index construction for text databases
ACM Transactions on Database Systems (TODS)
Approximate string matching in sublinear expected time
SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints
Proceedings of the VLDB Endowment
Efficient interactive fuzzy keyword search
Proceedings of the 18th international conference on World wide web
Approximating edit distance in near-linear time
Proceedings of the forty-first annual ACM symposium on Theory of computing
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
Transformation-based Framework for Record Matching
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Merging and Filtering Algorithms for Approximate String Searches
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Fast Indexes and Algorithms for Set Similarity Selection Queries
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Space-Constrained Gram-Based Indexing for Efficient Approximate String Search
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient top-k algorithms for fuzzy search in string collections
Proceedings of the First International Workshop on Keyword Search on Structured Data
Incremental maintenance of length normalized indexes for approximate string matching
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Extending autocompletion to tolerate errors
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Learning string transformations from examples
Proceedings of the VLDB Endowment
Bed-tree: an all-purpose index structure for string similarity search based on edit distance
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A linear size index for approximate pattern matching
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Cache-oblivious index for approximate string matching
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Trie-based similarity search and join
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Efficient parsing-based search over structured data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Asymmetric signature schemes for efficient exact edit similarity query processing
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
One of the most important primitive data types in modern data processing is text. Text data are known to have a variety of inconsistencies (e.g., spelling mistakes and representational variations). For that reason, there exists a large body of literature related to approximate processing of text. This monograph focuses specifically on the problem of approximate string matching, where, given a set of strings S and a query string v, the goal is to find all strings s ∈ S that have a user specified degree of similarity to v. Set S could be, for example, a corpus of documents, a set of web pages, or an attribute of a relational table. The similarity between strings is always defined with respect to a similarity function that is chosen based on the characteristics of the data and application at hand. This work presents a survey of indexing techniques and algorithms specifically designed for approximate string matching. We concentrate on inverted indexes, filtering techniques, and tree data structures that can be used to evaluate a variety of set based and edit based similarity functions. We focus on all-match and top-k flavors of selection and join queries, and discuss the applicability, advantages and disadvantages of each technique for every query type.