Identification and tracing of ambiguous names: discriminative and generative approaches

Authors:
Xin Li;Paul Morie;Dan Roth
Affiliations:
Department of Computer Science, University of Illinois, Urbana, IL;Department of Computer Science, University of Illinois, Urbana, IL;Department of Computer Science, University of Illinois, Urbana, IL
Venue:
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Year:
2004

Citing 7
Cited 21

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A machine learning approach to coreference resolution of noun phrases

Computational Linguistics - Special issue on computational anaphora resolution
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Profile-Based Object Matching for Information Integration

IEEE Intelligent Systems

Semantic integration in text: from ambiguous names to identifiable entities

AI Magazine - Special issue on semantic integration
Domain-independent data cleaning via analysis of entity-relationship graph

ACM Transactions on Database Systems (TODS)
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Named entity transliteration and discovery from multilingual comparable corpora

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Adaptive graphical approach to entity resolution

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages

Information Retrieval
An inference model for semantic entailment in natural language

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Structured generative models for unsupervised named-entity clustering

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised methods for determining object and relation synonyms on the web

Journal of Artificial Intelligence Research
Profile based cross-document coreference using kernelized fuzzy relational clustering

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Discriminative training of clustering functions: theory and experiments with entity identification

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Cross-document entity tracking

ECIR'07 Proceedings of the 29th European conference on IR research
Self-tuning in graph-based reference disambiguation

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
A generative entity-mention model for linking entities with knowledge base

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Collective entity linking in web text: a graph-based method

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
An inference model for semantic entailment in natural language

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Structured databases of named entities from Bayesian nonparametrics

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
A probabilistic model for canonicalizing named entity mentions

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
An entity-topic model for entity linking

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Entity linking at the tail: sparse signals, unknown entities, and phrase models

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A given entity - representing a person, a location or an organization - may be mentioned in text in multiple, ambiguous ways. Understanding natural language requires identifying whether different mentions of a name, within and across documents, represent the same entity. We present two machine learning approaches to this problem, which we call the "Robust Reading" problem. Our first approach is a discriminative approach, trained in a supervised way. Our second approach is a generative model, at the heart of which is a view on how documents are generated and how names (of different entity types) are "sprinkled" into them. In its most general form, our model assumes: (1) a joint distribution over entities (e.g., a document that mentions "President Kennedy" is more likely to mention "Oswald" or "White House" than "Roger Clemens"), (2) an "author" model, that assumes that at least one mention of an entity in a document is easily identifiable, and then generates other mentions via (3) an appearance model, governing how mentions are transformed from tile "representative" mention. We show that both approaches perform very accurately, in the range of 90% - 95% F1 measure for different entity types, much better than previous approaches to (some aspects of) this problem. Our extensive experiments exhibit the contribution of relational and structural features and, somewhat surprisingly, that the assumptions made within our generative model are strong enough to yield a very powerful approach, that performs better than a supervised approach with limited supervised information.