Collective cross-document relation extraction without labelled data

Authors:
Limin Yao;Sebastian Riedel;Andrew McCallum
Affiliations:
University of Massachusetts, Amherst;University of Massachusetts, Amherst;University of Massachusetts, Amherst
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 18
Cited 23

Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images

Readings in uncertain reasoning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
Kernel methods for relation extraction

The Journal of Machine Learning Research
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Using Wikipedia to bootstrap open information extraction

ACM SIGMOD Record
A Markov logic approach to bio-molecular event extraction

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Scaling textual inference to the web

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Bi-directional Joint Inference for Entity Resolution and Segmentation Using Imperatively-Defined Factor Graphs

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
Learning 5000 relational extractors

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Joint entity and relation extraction using card-pyramid parsing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Modeling relations and their mentions without labeled text

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III

Event discovery in social media feeds

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
In-domain relation discovery with meta-constraints via posterior regularization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Knowledge-based weak supervision for information extraction of overlapping relations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
It's who you know: graph mining using recursive structural features

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Customizing an information extraction system to a new domain

RELMS '11 Proceedings of the ACL 2011 Workshop on Relational Models of Semantics
Harvesting facts from textual web sources by constrained label propagation

Proceedings of the 20th ACM international conference on Information and knowledge management
Structured relation discovery using generative models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Multi event extraction guided by global constraints

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Improving distantly supervised extraction of drug-drug and protein-protein interactions

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Towards automatic construction of knowledge bases from Chinese online resources

ACL '12 Proceedings of ACL 2012 Student Research Workshop
Reducing wrong labels in distant supervision for relation extraction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Big data versus the crowd: looking for relationships in all the right places

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Linking named entities to any database

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Monte Carlo MCMC: efficient inference by approximate sampling

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Monte Carlo MCMC: efficient inference by sampling factors

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Crosslingual distant supervision for extracting relations of different complexity

Proceedings of the 21st ACM international conference on Information and knowledge management
Beyond myopic inference in big data pipelines

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Assessing sparse information extraction using semantic contexts

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Feature-based models for improving the quality of noisy training data for relation extraction

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Joint inference of entities, relations, and coreference

Proceedings of the 2013 workshop on Automated knowledge base construction
A survey of noise reduction methods for distant supervision

Proceedings of the 2013 workshop on Automated knowledge base construction
Universal schema for entity type prediction

Proceedings of the 2013 workshop on Automated knowledge base construction
Answer extraction from passage graph for question answering

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an indomain (Wikipedia) and a more realistic out-of-domain (New York Times Corpus) setting. For the in-domain setting, our joint model leads to 4% higher precision than an isolated local approach, but has no advantage over a pipeline. For the out-of-domain data, we benefit strongly from joint modelling, and observe improvements in precision of 13% over the pipeline, and 15% over the isolated baseline.