Bridging the lexical chasm: statistical approaches to answer-finding
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Memory-Based Language Processing (Studies in Natural Language Processing)
Memory-Based Language Processing (Studies in Natural Language Processing)
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Person cross document coreference with name perplexity estimates
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Clustering web people search results using fuzzy ants
Information Sciences: an International Journal
Dynamic parameters for cross document coreferece
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Methods of estimating the number of clusters for person cross document coreference task
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
This paper presents a combined supervised and unsupervised approach for multi-document person name disambiguation. Based on feature vectors reflecting pairwise comparisons between web pages, a classification algorithm provides linking information about document pairs, which leads to initial clusters. In addition, two different clustering algorithms are fed with matrices of weighted keywords. In a final step the "seed" clusters are combined with the results of the clustering algorithms. Results on the validation data show that a combined classification and clustering approach doesn't always compare favorably to those obtained by the different algorithms separately.