A comparative evaluation of name-matching algorithms

Authors:
L. Karl Branting
Affiliations:
LiveWire Logic, Inc., Morrisville, NC
Venue:
ICAIL '03 Proceedings of the 9th international conference on Artificial intelligence and law
Year:
2003

Citing 5
Cited 2

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Multidimensional access methods

ACM Computing Surveys (CSUR)
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Searching in metric spaces

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval

Intelligent hybrid approach to false identity detection

Proceedings of the 12th International Conference on Artificial Intelligence and Law
Disclosing false identity through hybrid link analysis

Artificial Intelligence and Law

Quantified Score

Hi-index	0.00

Visualization

Abstract

Name matching---recognizing when two different strings are likely to denote the same entity---is an important task in many legal information systems, such as case-management systems. The naming conventions peculiar to legal cases limit the effectiveness of generic approximate string-matching algorithms in this task. This paper proposes a three-stage framework for name matching, identifies how each stage in the framework addresses the naming variations that typically arise in legal cases, describes several alternative approaches to each stage, and evaluates the performance of various combinations of the alternatives on a representative collection of names drawn from a United States District Court case management system. The best tradeoff between accuracy and efficiency in this collection was achieved by algorithms that standardize capitalization, spacing, and punctuation; filter redundant terms; index using an abstraction function that is both order-insensitive and tolerant of small numbers of omissions or additions; and compare names in a symmetrical, word-by-word fashion.