Matchsimile: a flexible approximate matching tool for searching proper names

Authors:
Gonzalo Navarro;Ricardo Baeza-Yates;João Marcelo Azevedo Arcoverde
Affiliations:
Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile;Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile;Matchsimile Ltda - CTO, Rua Ribeiro de Brito, 1002/1103, CEP 51.021-310, Recife-PE, Brazil
Venue:
Journal of the American Society for Information Science and Technology
Year:
2003

Citing 15
Cited 7

The accuracy of approximate string matching algorithms

Journal of Computer Based Instruction
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
An introduction to the analysis of algorithms

An introduction to the analysis of algorithms
Fast text searching for regular expressions or automaton searching on tries

Journal of the ACM (JACM)
Applications of approximate word matching in information retrieval

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
An Extension of the String-to-String Correction Problem

Journal of the ACM (JACM)
Block addressing indices for approximate text retrieval

Journal of the American Society for Information Science - Special topic issue: When museum informatics meets the World Wide Web
A technique for computer detection and correction of spelling errors

Communications of the ACM
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences

Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Adding Compression to Block Addressing Inverted Indexes

Information Retrieval
Tries for Approximate String Matching

IEEE Transactions on Knowledge and Data Engineering
A Fast Algorithm on Average for All-Against-All Sequence Matching

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Disambiguation of proper names in text

ANLC '97 Proceedings of the fifth conference on Applied natural language processing

Fast multipattern search algorithms for intrusion detection

Fundamenta Informaticae - Special issue on computing patterns in strings
A hybrid approach to fuzzy name search incorporating language-based and text-based principles

Journal of Information Science
Efficient approximate entity extraction with edit distance constraints

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Names: a new frontier in text mining

ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Hashing-based approaches to spelling correction of personal names

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Fast Multipattern Search Algorithms for Intrusion Detection

Fundamenta Informaticae - Computing Patterns in Strings
Towards the automation of address identification

Scientometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile), which allows searching for lawyer names in official law publications.