Unsupervised Discrimination of Person Names in Web Contexts

Authors:
Ted Pedersen;Anagha Kulkarni
Affiliations:
University of Minnesota, Duluth, MN 55812, USA;Carnegie Mellon University, Pittsburgh, PA 15213, USA
Venue:
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 6
Cited 6

Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Selecting the "right" number of senses based on clustering criterion functions

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Evaluation of utility of LSA for word sense discrimination

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
An unsupervised language independent method of name discrimination using second order co-occurrence features

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Generating unambiguous URL clusters from web search

Proceedings of the 2009 workshop on Web Search Click Data
Classification of Dreams Using Machine Learning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A robust web personal name information extraction system

Expert Systems with Applications: An International Journal
Unsupervised name ambiguity resolution using a generative model

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Methods of estimating the number of clusters for person cross document coreference task

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Computational Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques to discriminate among entities named in Web pages. We examine three main issues via an extensive experimental study. First, the effect of using a held---out set of training data for feature selection versus using the data in which the ambiguous names occur. Second, the impact of using different measures of association for identifying lexical features. Third, the success of different cluster stopping measures that automatically determine the number of clusters in the data.