The accuracy of approximate string matching algorithms
Journal of Computer Based Instruction
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
An introduction to the analysis of algorithms
An introduction to the analysis of algorithms
Fast text searching for regular expressions or automaton searching on tries
Journal of the ACM (JACM)
Applications of approximate word matching in information retrieval
CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
An Extension of the String-to-String Correction Problem
Journal of the ACM (JACM)
Block addressing indices for approximate text retrieval
Journal of the American Society for Information Science - Special topic issue: When museum informatics meets the World Wide Web
A technique for computer detection and correction of spelling errors
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Adding Compression to Block Addressing Inverted Indexes
Information Retrieval
Tries for Approximate String Matching
IEEE Transactions on Knowledge and Data Engineering
A Fast Algorithm on Average for All-Against-All Sequence Matching
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Disambiguation of proper names in text
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Fast multipattern search algorithms for intrusion detection
Fundamenta Informaticae - Special issue on computing patterns in strings
A hybrid approach to fuzzy name search incorporating language-based and text-based principles
Journal of Information Science
Efficient approximate entity extraction with edit distance constraints
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Names: a new frontier in text mining
ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Hashing-based approaches to spelling correction of personal names
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Fast Multipattern Search Algorithms for Intrusion Detection
Fundamenta Informaticae - Computing Patterns in Strings
Towards the automation of address identification
Scientometrics
Hi-index | 0.00 |
We present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile), which allows searching for lawyer names in official law publications.