Feature-based models for improving the quality of noisy training data for relation extraction

Authors:
Benjamin Roth;Dietrich Klakow
Affiliations:
Saarland University, Saarbrucken, Germany;Saarland University, Saarbrucken, Germany
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 9
Cited 1

Exploring content models for multi-document summarization

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Collective cross-document relation extraction without labelled data

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Knowledge-based weak supervision for information extraction of overlapping relations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Knowledge base population: successful approaches and challenges

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Structured relation discovery using generative models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Pattern learning for relation extraction with a hierarchical topic model

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Multi-instance multi-label learning for relation extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

A survey of noise reduction methods for distant supervision

Proceedings of the 2013 workshop on Automated knowledge base construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised relation extraction from text relies on annotated data. Distant supervision is a scheme to obtain noisy training data by using a knowledge base of relational tuples as the ground truth and finding entity pair matches in a text corpus. We propose and evaluate two feature-based models for increasing the quality of distant supervision extraction patterns. The first model is an extension of a hierarchical topic model that induces background, relation specific and argument-pair specific feature distributions. The second model is a perceptron, trained to match an objective function that enforces two constraints: 1) an at-least-one semantics, i.e. at least one training example per relational tuple is assumed to be correct; 2) high scores for a dedicated NIL label that accounts for the noise in the training data. For both algorithms, neither explicit negative data nor the ratio of negatives has to be provided. Both algorithms give improvements over a maximum likelihood baseline as well as over a previous topic model without features, evaluated on TAC KBP data.