A hierarchical Naïve Bayes model for approximate identity matching

Authors:
G. Alan Wang;Homa Atabakhsh;Hsinchun Chen
Affiliations:
Department of Business Information Technology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, United States;Department of Management Information Systems, The University of Arizona, Tucson, AZ 85721, United States;Department of Management Information Systems, The University of Arizona, Tucson, AZ 85721, United States
Venue:
Decision Support Systems
Year:
2011

Citing 19
Cited 0

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Applied multivariate statistical analysis

Applied multivariate statistical analysis
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
The EM algorithm for graphical association models with missing data

Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
Categorization as probability density estimation

Journal of Mathematical Psychology
The impact of poor data quality on the typical enterprise

Communications of the ACM
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
A Distance-Based Approach to Entity Reconciliation in Heterogeneous Databases

IEEE Transactions on Knowledge and Data Engineering
Unsupervised Learning with Mixed Numeric and Nominal Data

IEEE Transactions on Knowledge and Data Engineering
Clustering for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
Data association methods with applications to law enforcement

Decision Support Systems
Automatically detecting deceptive criminal identities

Communications of the ACM - Homeland security
Hierarchical Latent Class Models for Cluster Analysis

The Journal of Machine Learning Research
Classification using Hierarchical Naïve Bayes models

Machine Learning
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Secure and useful data sharing

Decision Support Systems
Entity matching in heterogeneous databases: A logistic regression approach

Decision Support Systems
Fighting cybercrime: a review and the Taiwan experience

Decision Support Systems - Special issue: Intelligence and security informatics
Latent variable discovery in classification models

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Organizations often manage identity information for their customers, vendors, and employees. Identity management is critical to various organizational practices ranging from customer relationship management to crime investigation. The task of searching for a specific identity is difficult because disparate identity information may exist due to the issues related to unintentional errors and intentional deception. In this paper we propose a hierarchical Naive Bayes model that improves existing identity matching techniques in terms of searching effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based matching technique. With 50% training instances labeled, the proposed semi-supervised learning achieves a performance comparable to the fully supervised record comparison algorithm. The semi-supervised learning greatly reduces the efforts of manually labeling training instances without significant performance degradation.