Improved extraction assessment through better language models

Authors:
Arun Ahuja;Doug Downey
Affiliations:
Northwestern University, Evanston, IL;Northwestern University, Evanston, IL
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 5
Cited 4

Text compression

Text compression
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Semantic taxonomy induction from heterogenous evidence

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Analysis of a probabilistic model of redundancy in unsupervised information extraction

Artificial Intelligence
Language models as representations for weakly-supervised NLP tasks

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Biased representation learning for domain adaptation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Assessing sparse information extraction using semantic contexts

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A variety of information extraction techniques rely on the fact that instances of the same relation are "distributionally similar," in that they tend to appear in similar textual contexts. We demonstrate that extraction accuracy depends heavily on the accuracy of the language model utilized to estimate distributional similarity. An unsupervised model selection technique based on this observation is shown to reduce extraction and type-checking error by 26% over previous results, in experiments with Hidden Markov Models. The results suggest that optimizing statistical language models over unlabeled data is a promising direction for improving weakly supervised and unsupervised information extraction.