Universal schema for entity type prediction

Authors:
Limin Yao;Sebastian Riedel;Andrew McCallum
Affiliations:
University of Massachusetts, Amherst, MA, USA;University College London, London, United Kingdom;University of Massachusetts, Amherst, MA, USA
Venue:
Proceedings of the 2013 workshop on Automated knowledge base construction
Year:
2013

Citing 19
Cited 0

Class-based n-gram models of natural language

Computational Linguistics
A neural probabilistic language model

The Journal of Machine Learning Research
Fine grained classification of named entities

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A unified architecture for natural language processing: deep neural networks with multitask learning

Proceedings of the 25th international conference on Machine learning
Instance-based ontology population exploiting named-entity substitution

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Learning entailment rules for unary templates

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Scaling textual inference to the web

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Structured generative models for unsupervised named-entity clustering

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
Assessing the challenge of fine-grained named entity recognition and classification

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Collective cross-document relation extraction without labelled data

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Learning first-order Horn clauses from web text

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Structured relation discovery using generative models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mining entity types from query logs via user intent modeling

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Probabilistic databases of universal schema

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Hierarchical target type identification for entity-oriented queries

Proceedings of the 21st ACM international conference on Information and knowledge management
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Categorizing entities by their types is useful in many applications, including knowledge base construction, relation extraction and query intent prediction. Fine-grained entity type ontologies are especially valuable, but typically difficult to design because of unavoidable quandaries about level of detail and boundary cases. Automatically classifying entities by type is challenging as well, usually involving hand-labeling data and training a supervised predictor. This paper presents a universal schema approach to fine-grained entity type prediction. The set of types is taken as the union of textual surface patterns (e.g. appositives) and pre-defined types from available databases (e.g. Freebase)---yielding not tens or hundreds of types, but more than ten thousands of entity types, such as financier, criminologist, and musical trio. We robustly learn mutual implication among this large union by learning latent vector embeddings from probabilistic matrix factorization, thus avoiding the need for hand-labeled data. Experimental results demonstrate more than 30% reduction in error versus a traditional classification approach on predicting fine-grained entities types.