Selecting hierarchical clustering cut points for web person-name disambiguation

Authors:
Jun Gong;Douglas W. Oard
Affiliations:
Department of Information System Beihang University , Beijing, China;College of Information Studies/UMIACS University of Maryland, College Park, MD, USA
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 2
Cited 2

Foundations of statistical natural language processing

Foundations of statistical natural language processing
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations

Document clustering with universum

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hierarchical co-clustering: off-line and incremental approaches

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical clustering is often used to cluster person-names referring to the same entities. Since the correct number of clusters for a given person-name is not known a priori, some way of deciding where to cut the resulting dendrogram to balance risks of over- or under-clustering is needed. This paper reports on experiments in which outcome-specific and result-set measures are used to learn a global similarity threshold. Results on the Web People Search (WePS)-2 task indicate that approximately 85% of the optimal F1 measure can be achieved on held-out data.