The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A machine learning approach to coreference resolution of noun phrases
Computational Linguistics - Special issue on computational anaphora resolution
Improving machine learning approaches to coreference resolution
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Profile-Based Object Matching for Information Integration
IEEE Intelligent Systems
Semantic integration in text: from ambiguous names to identifiable entities
AI Magazine - Special issue on semantic integration
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Named entity transliteration with comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Weakly supervised named entity transliteration and discovery from multilingual comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Named entity transliteration and discovery from multilingual comparable corpora
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Adaptive graphical approach to entity resolution
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
An inference model for semantic entailment in natural language
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Structured generative models for unsupervised named-entity clustering
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised methods for determining object and relation synonyms on the web
Journal of Artificial Intelligence Research
Profile based cross-document coreference using kernelized fuzzy relational clustering
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Discriminative training of clustering functions: theory and experiments with entity identification
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Cross-document entity tracking
ECIR'07 Proceedings of the 29th European conference on IR research
Self-tuning in graph-based reference disambiguation
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
A generative entity-mention model for linking entities with knowledge base
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Collective entity linking in web text: a graph-based method
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
An inference model for semantic entailment in natural language
MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Structured databases of named entities from Bayesian nonparametrics
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
A probabilistic model for canonicalizing named entity mentions
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
An entity-topic model for entity linking
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Entity linking at the tail: sparse signals, unknown entities, and phrase models
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
A given entity - representing a person, a location or an organization - may be mentioned in text in multiple, ambiguous ways. Understanding natural language requires identifying whether different mentions of a name, within and across documents, represent the same entity. We present two machine learning approaches to this problem, which we call the "Robust Reading" problem. Our first approach is a discriminative approach, trained in a supervised way. Our second approach is a generative model, at the heart of which is a view on how documents are generated and how names (of different entity types) are "sprinkled" into them. In its most general form, our model assumes: (1) a joint distribution over entities (e.g., a document that mentions "President Kennedy" is more likely to mention "Oswald" or "White House" than "Roger Clemens"), (2) an "author" model, that assumes that at least one mention of an entity in a document is easily identifiable, and then generates other mentions via (3) an appearance model, governing how mentions are transformed from tile "representative" mention. We show that both approaches perform very accurately, in the range of 90% - 95% F1 measure for different entity types, much better than previous approaches to (some aspects of) this problem. Our extensive experiments exhibit the contribution of relational and structural features and, somewhat surprisingly, that the assumptions made within our generative model are strong enough to yield a very powerful approach, that performs better than a supervised approach with limited supervised information.