Cross-document summarization by concept classification
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Whither written language evaluation?
HLT '94 Proceedings of the workshop on Human Language Technology
Automatic cluster stopping with criterion functions and the gap statistic
NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
Using a knowledge base to disambiguate personal name in web search results
Proceedings of the 2007 ACM symposium on Applied computing
Inferring Coreferences Among Person Names in a Large Corpus of News Collections
AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Unsupervised Discrimination of Person Names in Web Contexts
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
AUG: a combined classification and clustering approach for web people disambiguation
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Shallow semantics for coreference resolution
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hi-index | 0.00 |
Knowing the number of different individuals carrying the same name may improve the overall accuracy of a Person Cross Document Coreference System, which processes large corpora and clusters the name mentions according to the individuals carrying them. In this paper we present a series of methods of estimating this number. In particular, an estimation method based on name perplexity, which brings a large improvement over the baseline given by the gap statistics, is instrumental in reaching accurate clustering results because not only it can predict the number of clusters with a very good confidence, but also it can indicate what type of clustering method works best for each particular name.