A dictionary-based approach to fast and accurate name matching in large law enforcement databases

Authors:
Olcay Kursun;Anna Koufakou;Bing Chen;Michael Georgiopoulos;Kenneth M. Reynolds;Ron Eaglin
Affiliations:
Department of Engineering Technology, University of Central Florida, Orlando, FL;School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL;School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL;School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL;Department of Criminal Justice and Legal Studies, University of Central Florida, Orlando, FL;Department of Engineering Technology, University of Central Florida, Orlando, FL
Venue:
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Year:
2006

Citing 8
Cited 0

Finding approximate matches in large lexicons

Software—Practice & Experience
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
A Trie Compaction Algorithm for a Large Set of Keys

IEEE Transactions on Knowledge and Data Engineering
Automatically detecting deceptive criminal identities

Communications of the ACM - Homeland security
Information Policy, Data Mining, and National Security: False Positives and Unidentified Negatives

HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 5 - Volume 05
Fast Approximate Search in Large Dictionaries

Computational Linguistics
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the presence of dirty data, a search for specific information by a standard query (e.g., search for a name that is misspelled or mistyped) does not return all needed information. This is an issue of grave importance in homeland security, criminology, medical applications, GIS (geographic information systems) and so on. Different techniques, such as soundex, phonix, n-grams, edit-distance, have been used to improve the matching rate in these name-matching applications. There is a pressing need for name matching approaches that provide high levels of accuracy, while at the same time maintaining the computational complexity of achieving this goal reasonably low. In this paper, we present ANSWER, a name matching approach that utilizes a prefix-tree of available names in the database. Creating and searching the name dictionary tree is fast and accurate and, thus, ANSWER is superior to other techniques of retrieving fuzzy name matches in large databases.