Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Automatic word sense discrimination
Computational Linguistics - Special issue on word sense disambiguation
Selecting the "right" number of senses based on clustering criterion functions
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Evaluation of utility of LSA for word sense discrimination
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Name discrimination by clustering similar contexts
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Generating unambiguous URL clusters from web search
Proceedings of the 2009 workshop on Web Search Click Data
Classification of Dreams Using Machine Learning
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A robust web personal name information extraction system
Expert Systems with Applications: An International Journal
Unsupervised name ambiguity resolution using a generative model
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Methods of estimating the number of clusters for person cross document coreference task
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB
Computational Intelligence
Hi-index | 0.00 |
Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques to discriminate among entities named in Web pages. We examine three main issues via an extensive experimental study. First, the effect of using a held---out set of training data for feature selection versus using the data in which the ambiguous names occur. Second, the impact of using different measures of association for identifying lexical features. Third, the success of different cluster stopping measures that automatically determine the number of clusters in the data.