Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
A Distance-Based Approach to Entity Reconciliation in Heterogeneous Databases
IEEE Transactions on Knowledge and Data Engineering
Automatically detecting deceptive criminal identities
Communications of the ACM - Homeland security
A hierarchical graphical model for record linkage
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Open code for digital government
dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
Cross-jurisdictional activity networks to support criminal investigations
dg.o '04 Proceedings of the 2004 annual national conference on Digital government research
Discovering identity problems: a case study
ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Hi-index | 0.00 |
Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a probabilistic Naïve Bayes model that improves existing identity matching techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique as well as the approximate-match based record comparison algorithm. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 10% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.