AUG: a combined classification and clustering approach for web people disambiguation

Authors:
Els Lefever;Véronique Hoste;Timur Fayruzov
Affiliations:
Ghent University Association, Gent;Ghent University Association, Gent;Ghent University Association, Gent
Venue:
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Year:
2007

Citing 5
Cited 5

Bridging the lexical chasm: statistical approaches to answer-finding

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Memory-Based Language Processing (Studies in Natural Language Processing)

Memory-Based Language Processing (Studies in Natural Language Processing)
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations

Name perplexity

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Person cross document coreference with name perplexity estimates

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Clustering web people search results using fuzzy ants

Information Sciences: an International Journal
Dynamic parameters for cross document coreferece

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Methods of estimating the number of clusters for person cross document coreference task

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a combined supervised and unsupervised approach for multi-document person name disambiguation. Based on feature vectors reflecting pairwise comparisons between web pages, a classification algorithm provides linking information about document pairs, which leads to initial clusters. In addition, two different clustering algorithms are fed with matrices of weighted keywords. In a final step the "seed" clusters are combined with the results of the clustering algorithms. Results on the validation data show that a combined classification and clustering approach doesn't always compare favorably to those obtained by the different algorithms separately.