Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Disambiguating Web appearances of people in a social network
WWW '05 Proceedings of the 14th international conference on World Wide Web
Fine grained classification of named entities
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Novel association measures using web search with double checking
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Cross-document event clustering using knowledge mining from co-reference chains
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Name discrimination by clustering similar contexts
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 12.05 |
This paper considers five features including titles, community chains, terms, temporal expressions, and hostnames for personal name disambiguation. In nine test data sets covering three ambiguous personal names, we address the issues of awareness degree of an entity, the source of materials and web pages in different areas. In a single-clusterer approach, employing all features achieve average F-score 0.635, which is better than employing contextual terms only 0.502. When community chains are expanded by using the web, the average F-score is increased to 0.676. We also propose a multiple-clusterer approach, which cascades five clusterers corresponding to the five features. The average F-score is further improved to 0.684. Expanding community chains also enhances the average F-score of the multiple-clusterer approach to 0.697. In summary, the proposed features are quite useful; the cascaded multiple-clusterer approach is better than the single-clusterer approach; and expanding community chains using the web has positive effects on personal name disambiguation. The experiments show that this approach has significant improvements.