Hardening soft information sources
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised learning of name structure from coreference data
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Introduction to Information Retrieval
Introduction to Information Retrieval
Identification and tracing of ambiguous names: discriminative and generative approaches
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Understanding the value of features for coreference resolution
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Joint unsupervised coreference resolution with Markov logic
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Structured generative models for unsupervised named-entity clustering
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
PSNUS: web people name disambiguation by simple clustering with rich features
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Coreference resolution in a modular, entity-centered model
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
An entity-level approach to information extraction
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Large-scale cross-document coreference using distributed inference and hierarchical models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Template-based information extraction without the templates
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A probabilistic model for canonicalizing named entity mentions
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Name phylogeny: a generative model of string variation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
We present a nonparametric Bayesian approach to extract a structured database of entities from text. Neither the number of entities nor the fields that characterize each entity are provided in advance; the only supervision is a set of five prototype examples. Our method jointly accomplishes three tasks: (i) identifying a set of canonical entities, (ii) inferring a schema for the fields that describe each entity, and (iii) matching entities to their references in raw text. Empirical evaluation shows that the approach learns an accurate database of entities and a sensible model of name structure.