Tries for Approximate String Matching

Authors:
H. Shang;T. h. Merrettal
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
1996

Citing 18
Cited 21

Relational Information Systems

Relational Information Systems
New indices for text: PAT Trees and PAT arrays

Information retrieval
String searching algorithms

Information retrieval
A new approach to text searching

Communications of the ACM
Fast text searching: allowing errors

Communications of the ACM
An approximate string-matching algorithm

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
String searching algorithms

String searching algorithms
Trie methods for text and spatial data on secondary storage

Trie methods for text and spatial data on secondary storage
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
The String-to-String Correction Problem

Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Approximate String Matching

ACM Computing Surveys (CSUR)
A fast string searching algorithm

Communications of the ACM
A technique for computer detection and correction of spelling errors

Communications of the ACM
Trie Methods for Representing Text

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Efficient Text Searching of Regular Expressions (Extended Abstract)

ICALP '89 Proceedings of the 16th International Colloquium on Automata, Languages and Programming
Fast and Practical Approximate String Matching

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching

Advanced grouping and aggregation for data integration

Proceedings of the tenth international conference on Information and knowledge management
Time-Space Trade-Off Analysis of Morphic Trie Images

IEEE Transactions on Knowledge and Data Engineering
Matchsimile: a flexible approximate matching tool for searching proper names

Journal of the American Society for Information Science and Technology
Efficient similarity-based operations for data integration

Data & Knowledge Engineering
Indexing mixed types for approximate retrieval

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient approach for sequence matching in large DNA databases

Journal of Information Science
A dictionary for approximate string search and longest prefix search

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On-line Approximate String Matching in Natural Language

Fundamenta Informaticae
High-error approximate dictionary search using estimate hash comparisons

Software—Practice & Experience
Compacting music signatures for efficient music retrieval

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Example based video filters

Proceedings of the ACM International Conference on Image and Video Retrieval
Using similarity-based operations for resolving data-level conflicts

BNCOD'03 Proceedings of the 20th British national conference on Databases
Prefix tree indexing for similarity search and similarity joins on genomic data

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Exploiting available memory and disk for scalable instant overview search

WISE'11 Proceedings of the 12th international conference on Web information system engineering
Enhancing trie-based syntactic pattern recognition using AI heuristic search strategies

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
A novel indexing method for efficient sequence matching in large DNA database environment

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Efficient approximate dictionary look-up for long words over small alphabets

LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
Scalable sequence similarity search and join in main memory on multi-cores

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Efficient similarity search in very large string sets

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
On-line Approximate String Matching in Natural Language

Fundamenta Informaticae
A syntactic PR approach to Telugu handwritten character recognition

Proceeding of the workshop on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tries offer text searches with costs which are independent of the size of the document being searched, and so are important for large documents requiring spelling checkers, case insensitivity, and limited approximate regular secondary storage. Approximate searches, in which the search pattern differs from the document by k substitutions, transpositions, insertions or deletions, have hitherto been carried out only at costs linear in the size of the document. We present a trie-based method whose cost is independent of document size. Our experiments show that this new method significantly outperforms the nearest competitor for k = 0 and k = 1, which are arguably the most important cases. The linear cost (in k) of the other methods begins to catch up, for our small files, only at k = 2. For larger files, complexity arguments indicate that tries will outperform the linear methods for larger values of k. Trie indexes combine suffixes and so are compact in storage. When the text itself does not need to be stored, as in a spelling checker, we even obtain negative overhead: 50% compression. We discuss a variety of applications and extensions, including best match (for spelling checkers), case insensitivity, and limited approximate regular expression matching.