Extreme extraction: machine reading in a week

Authors:
Marjorie Freedman;Lance Ramshaw;Elizabeth Boschee;Ryan Gabbard;Gary Kratkiewicz;Nicolas Ward;Ralph Weischedel
Affiliations:
Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 15
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Surprise! What's in a Cebuano or Hindi Name?

ACM Transactions on Asian Language Information Processing (TALIP)
Hindi-english cross-lingual question-answering system

ACM Transactions on Asian Language Information Processing (TALIP)
Rapid development of Hindi named entity recognition using conditional random fields and feature induction

ACM Transactions on Asian Language Information Processing (TALIP)
Rapid customization of an information extraction system for a surprise language

ACM Transactions on Asian Language Information Processing (TALIP)
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Experiments in multi-modal automatic content extraction

HLT '01 Proceedings of the first international conference on Human language technology research
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Relation extraction using label propagation based semi-supervised learning

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Populating the Semantic Web by Macro-reading Internet Text

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Not all seeds are equal: measuring the quality of text mining seeds

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Empirical studies in learning to read

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report on empirical results in extreme extraction. It is extreme in that (1) from receipt of the ontology specifying the target concepts and relations, development is limited to one week and that (2) relatively little training data is assumed. We are able to surpass human recall and achieve an F1 of 0.51 on a question-answering task with less than 50 hours of effort using a hybrid approach that mixes active learning, bootstrapping, and limited (5 hours) manual rule writing. We compare the performance of three systems: extraction with handwritten rules, bootstrapped extraction, and a combination. We show that while the recall of the handwritten rules surpasses that of the learned system, the learned system is able to improve the overall recall and F1.