A probabilistic model for approximate identity matching

  • Authors:
  • G. Alan Wang;Hsinchun Chen;Homa Atabakhsh

  • Affiliations:
  • University of Arizona, Tucson, AZ;University of Arizona, Tucson, AZ;University of Arizona, Tucson, AZ

  • Venue:
  • dg.o '06 Proceedings of the 2006 international conference on Digital government research
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a probabilistic Naïve Bayes model that improves existing identity matching techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique as well as the approximate-match based record comparison algorithm. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 10% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.