Weakly supervised learning for cross-document person name disambiguation supported by information extraction

Authors:
Cheng Niu;Wei Li;Rohini K. Srihari
Affiliations:
Cymfony Inc., Williamsville, NY;Cymfony Inc., Williamsville, NY;Cymfony Inc., Williamsville, NY
Venue:
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Year:
2004

Citing 2
Cited 18

Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language

Web-based acquisition of Japanese katakana variants

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Person resolution in person search results: WebHawk

Proceedings of the 14th ACM international conference on Information and knowledge management
Named entity translation matching and learning: With application for mining unseen translations

ACM Transactions on Information Systems (TOIS)
Alleviating the Problem of Wrong Coreferences in Web Person Search

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Named entity disambiguation by leveraging wikipedia semantic knowledge

Proceedings of the 18th ACM conference on Information and knowledge management
Web personal name disambiguation based on reference entity tables mined from the web

Proceedings of the eleventh international workshop on Web information and data management
Clustering technique in multi-document personal name disambiguation

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Profile based cross-document coreference using kernelized fuzzy relational clustering

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Person name disambiguation in web pages using social network, compound words and latent topics

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Person name disambiguation by bootstrapping

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Structural semantic relatedness: a knowledge-based method to named entity disambiguation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Enhancing cross document coreference of web documents with context similarity and very large scale text categorization

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Large-scale cross-document coreference using distributed inference and hierarchical models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A robust web personal name information extraction system

Expert Systems with Applications: An International Journal
Exploiting Web querying for Web people search

ACM Transactions on Database Systems (TODS)
NAYOSE: a system for reference disambiguation of proper nouns appearing on web pages

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Explore person specific evidence in web person name disambiguation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Evaluating Entity Linking with Wikipedia

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is fairly common that different people are associated with the same name. In tracking person entities in a large document pool, it is important to determine whether multiple mentions of the same name across documents refer to the same entity or not. Previous approach to this problem involves measuring context similarity only based on co-occurring words. This paper presents a new algorithm using information extraction support in addition to co-occurring words. A learning scheme with minimal supervision is developed within the Bayesian framework. Maximum entropy modeling is then used to represent the probability distribution of context similarities based on heterogeneous features. Statistical annealing is applied to derive the final entity coreference chains by globally fitting the pairwise context similarities. Benchmarking shows that our new approach significantly outperforms the existing algorithm by 25 percentage points in overall F-measure.