Methods of estimating the number of clusters for person cross document coreference task

Authors:
Octavian Popescu;Roberto Zanoli
Affiliations:
-;-
Venue:
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Year:
2012

Citing 10
Cited 0

Cross-document summarization by concept classification

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Whither written language evaluation?

HLT '94 Proceedings of the workshop on Human Language Technology
Automatic cluster stopping with criterion functions and the gap statistic

NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
Using a knowledge base to disambiguate personal name in web search results

Proceedings of the 2007 ACM symposium on Applied computing
Inferring Coreferences Among Person Names in a Large Corpus of News Collections

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Unsupervised Discrimination of Person Names in Web Contexts

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
AUG: a combined classification and clustering approach for web people disambiguation

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Shallow semantics for coreference resolution

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Knowing the number of different individuals carrying the same name may improve the overall accuracy of a Person Cross Document Coreference System, which processes large corpora and clusters the name mentions according to the individuals carrying them. In this paper we present a series of methods of estimating this number. In particular, an estimation method based on name perplexity, which brings a large improvement over the baseline given by the gap statistics, is instrumental in reaching accurate clustering results because not only it can predict the number of clusters with a very good confidence, but also it can indicate what type of clustering method works best for each particular name.