Modern database systems
Finding approximate matches in large lexicons
Software—Practice & Experience
Dictionary organizations for efficient similarity retrieval
Journal of Systems and Software
Phonetic string matching: lessons from information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of approximate string matching algorithms
Software—Practice & Experience
An algorithm to align words for historical comparison
Computational Linguistics
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
Multidimensional access methods
ACM Computing Surveys (CSUR)
Multi-method dispatching: a geometric approach with applications to string matching problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
A fast bit-vector algorithm for approximate string matching based on dynamic programming
Journal of the ACM (JACM)
Reducing the space requirement of suffix trees
Software—Practice & Experience
Subword-based approaches for spoken document retrieval
Speech Communication
Text indexing and dictionary matching with one error
Journal of Algorithms
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Fast and flexible string matching by combining bit-parallelism and suffix automata
Journal of Experimental Algorithmics (JEA)
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
NR-grep: a fast and flexible pattern-matching tool
Software—Practice & Experience
Searching Multimedia Databases by Content
Searching Multimedia Databases by Content
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching
Multimedia Tools and Applications
Probabilistic proximity search: fighting the curse of dimensionality in metric spaces
Information Processing Letters
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
t-Spanners as a Data Structure for Metric Space Searching
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Experiments on Adaptive Set Intersections for Text Retrieval Systems
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
One-Gapped q-Gram Filtersfor Levenshtein Distance
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Filtration with q-Samples in Approximate String Matching
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Approximate Multiple Strings Search
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Effective Indexing and Filtering for Similarity Search in Large Biosequence Databases
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
A Fast Algorithm on Average for All-Against-All Sequence Matching
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Incremental construction of minimal acyclic finite-state automata
Computational Linguistics - Special issue on finite-state methods in NLP
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Comparing inverted files and signature files for searching a large lexicon
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Fast Approximate Search in Large Dictionaries
Computational Linguistics
Representing Trees of Higher Degree
Algorithmica
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Spelling correction in the PubMed search engine
Information Retrieval
A dictionary for approximate string search and longest prefix search
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
String Matching with Differences by Finite Automata
ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
Incremental construction of minimal acyclic sequential transducers from unsorted data
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Engineering efficient metric indexes
Pattern Recognition Letters
Compressed indexes for approximate string matching
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets
ACM Transactions on Algorithms (TALG)
Compressed Suffix Trees with Full Functionality
Theory of Computing Systems
An(other) Entropy-Bounded Compressed Suffix Tree
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Faster and Space-Optimal Edit Distance "1" Dictionary
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Indexing Variable Length Substrings for Exact and Approximate Matching
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
A Two-Tire Index Structure for Approximate String Matching with Block Moves
Database Systems for Advanced Applications
Simple space-time trade-offs for AESA
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Brief communication: An efficient similarity search based on indexing in large DNA databases
Computational Biology and Chemistry
Faster adaptive set intersections for text searching
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Dotted suffix trees a structure for approximate text indexing
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Practical compressed suffix trees
SEA'10 Proceedings of the 9th international conference on Experimental Algorithms
An efficient algorithm for generating super condensed neighborhoods
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Measuring the perpetrators and funders of typosquatting
FC'10 Proceedings of the 14th international conference on Financial Cryptography and Data Security
Experimental analysis of a fast intersection algorithm for sorted sequences
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
IP-address lookup using LC-tries
IEEE Journal on Selected Areas in Communications
Simple and space-efficient minimal perfect hash functions
WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures
Super-Linear indices for approximate dictionary searching
SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Efficient fuzzy search in large text collections
ACM Transactions on Information Systems (TOIS)
Efficient error-tolerant query autocompletion
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The primary goal of this article is to survey state-of-the-art indexing methods for approximate dictionary searching. To improve understanding of the field, we introduce a taxonomy that classifies all methods into direct methods and sequence-based filtering methods. We focus on infrequently updated dictionaries, which are used primarily for retrieval. Therefore, we consider indices that are optimized for retrieval rather than for update. The indices are assumed to be associative, that is, capable of storing and retrieving auxiliary information, such as string identifiers. All solutions are lossless and guarantee retrieval of strings within a specified edit distance k. Benchmark results are presented for the practically important cases of k=1, 2, and 3. We concentrate on natural language datasets, which include synthetic English and Russian dictionaries, as well as dictionaries of frequent words extracted from the ClueWeb09 collection. In addition, we carry out experiments with dictionaries containing DNA sequences. The article is concluded with a discussion of benchmark results and directions for future research.