Clustering technique in multi-document personal name disambiguation

Authors:
Chen Chen;Hu Junfeng;Wang Houfeng
Affiliations:
Key Laboratory of Computational Linguistics (Peking University), China;Key Laboratory of Computational Linguistics (Peking University), China;Key Laboratory of Computational Linguistics (Peking University), China
Venue:
ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Year:
2009

Citing 9
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Grouping search-engine returned citations for person-name queries

Proceedings of the 6th annual ACM international workshop on Web information and data management
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Weakly supervised learning for cross-document person name disambiguation supported by information extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Automatic cluster stopping with criterion functions and the gap statistic

NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
Efficient topic-based unsupervised name disambiguation

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
How many different "John Smiths", and who are they?

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Person name disambiguation in web pages using social network, compound words and latent topics

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining

Superficial method for extracting social network for academics using web snippets

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Focusing on multi-document personal name disambiguation, this paper develops an agglomerative clustering approach to resolving this problem. We start from an analysis of point-wise mutual information between feature and the ambiguous name, which brings about a novel weight computing method for feature in clustering. Then a trade-off measure between within-cluster compactness and among-cluster separation is proposed for stopping clustering. After that, we apply a labeling method to find representative feature for each cluster. Finally, experiments are conducted on word-based clustering in Chinese dataset and the result shows a good effect.