Name discrimination by clustering similar contexts

Authors:
Ted Pedersen;Amruta Purandare;Anagha Kulkarni
Affiliations:
University of Minnesota, Duluth, MN;University of Pittsburgh, Pittsburgh, PA;University of Minnesota, Duluth, MN
Venue:
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2005

Citing 9
Cited 50

Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Disambiguation of proper names in text

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
New Techniques for Disambiguation in Natural Language and Their Application to Biological Text

The Journal of Machine Learning Research
Discriminating among word senses using McQuitty's similarity analysis

NAACLstudent '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT-NAACL 2003 student research workshop - Volume 3
Category-based pseudowords

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

SenseClusters: unsupervised clustering and labeling of similar contexts

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Using a knowledge base to disambiguate personal name in web search results

Proceedings of the 2007 ACM symposium on Applied computing
Improving the performance of personal name disambiguation using web directories

Information Processing and Management: an International Journal
IdentityRank: Named Entity Disambiguation in the Context of the NEWS Project

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Web People Search with Domain Ranking

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Name Disambiguation Boosted by Latent Topics from Web Directories

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages

Information Retrieval
Alleviating the Problem of Wrong Coreferences in Web Person Search

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Improved Unsupervised Name Discrimination with Very Wide Bigrams and Automatic Cluster Stopping

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Unsupervised Discrimination of Person Names in Web Contexts

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Disambiguating Personal Names on the Web using Automatically Extracted Key Phrases

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A language independent approach for name categorization and discrimination

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
How many different "John Smiths", and who are they?

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Improving name discrimination: a language salad approach

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Bootstrapping named entity recognition with automatically generated gazetteer lists

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Identifying similar words and contexts in natural language with SenseClusters

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Name perplexity

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
CU-COMSEM: exploring rich features for unsupervised web personal name disambiguation

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
TITPI: web people search task using semi-supervised clustering approach

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
TKB-UO: using sense clustering for WSD

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
UMND2: SenseClusters applied to the sense induction task of Senseval-4

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Unsupervised discrimination and labeling of ambiguous names

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Extracting key phrases to disambiguate personal name queries in web search

CLIIR '06 Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?
Named entity disambiguation by leveraging wikipedia semantic knowledge

Proceedings of the 18th ACM conference on Information and knowledge management
Web personal name disambiguation based on reference entity tables mined from the web

Proceedings of the eleventh international workshop on Web information and data management
Dimensionality reduction aids term co-occurrence based multi-document summarization

SumQA '06 Proceedings of the Workshop on Task-Focused Summarization and Question Answering
Context comparison as a minimum cost flow problem

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Person cross document coreference with name perplexity estimates

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A graph-theoretic framework for semantic distance

Computational Linguistics
A cascaded classification approach to disambiguating polysemous mentions with social chains

Expert Systems with Applications: An International Journal
Personal name disambiguation in web search results based on a semi-supervised clustering approach

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Learning to link entities with knowledge base

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Structural semantic relatedness: a knowledge-based method to named entity disambiguation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Resolving surface forms to Wikipedia topics

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
The effect of different context representations on word sense discrimination in biomedical texts

Proceedings of the 1st ACM International Health Informatics Symposium
Dynamic parameters for cross document coreferece

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A generative entity-mention model for linking entities with knowledge base

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Collective entity linking in web text: a graph-based method

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A robust web personal name information extraction system

Expert Systems with Applications: An International Journal
Word sense disambiguation based on word sense clustering

IBERAMIA-SBIA'06 Proceedings of the 2nd international joint conference, and Proceedings of the 10th Ibero-American Conference on AI 18th Brazilian conference on Advances in Artificial Intelligence
An unsupervised language independent method of name discrimination using second order co-occurrence features

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Extracting key phrases to disambiguate personal names on the web

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Unsupervised name ambiguity resolution using a generative model

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
IdentityRank: Named entity disambiguation in the news domain

Expert Systems with Applications: An International Journal
LINDEN: linking named entities with knowledge base via semantic knowledge

Proceedings of the 21st international conference on World Wide Web
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Computational Intelligence
Named entity disambiguation in streaming data

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
An entity-topic model for entity linking

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Domain-Independent Entity Coreference for Linking Ontology Instances

Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Towards a fair comparison between name disambiguation approaches

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is relatively common for different people or organizations to share the same name. Given the increasing amount of information available online, this results in the ever growing possibility of finding misleading or incorrect information due to confusion caused by an ambiguous name. This paper presents an unsupervised approach that resolves name ambiguity by clustering the instances of a given name into groups, each of which is associated with a distinct underlying entity. The features we employ to represent the context of an ambiguous name are statistically significant bigrams that occur in the same context as the ambiguous name. From these features we create a co–occurrence matrix where the rows and columns represent the first and second words in bigrams, and the cells contain their log–likelihood scores. Then we represent each of the contexts in which an ambiguous name appears with a second order context vector. This is created by taking the average of the vectors from the co–occurrence matrix associated with the words that make up each context. This creates a high dimensional “instance by word” matrix that is reduced to its most significant dimensions by Singular Value Decomposition (SVD). The different “meanings” of a name are discriminated by clustering these second order context vectors with the method of Repeated Bisections. We evaluate this approach by conflating pairs of names found in a large corpus of text to create ambiguous pseudo-names. We find that our method is significantly more accurate than the majority classifier, and that the best results are obtained by having a small amount of local context to represent the instance, along with a larger amount of context for identifying features, or vice versa.