Algorithms that learn to extract information: BBN: TIPSTER phase III

Authors:
Scott Miller;Michael Crystal;Heidi Fox;Lance Ramshaw;Richard Schwartz;Rebecca Stone;Ralph Weischedel
Affiliations:
BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA
Venue:
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Year:
1998

Citing 5
Cited 2

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics

Hedge Trimmer: a parse-and-trim approach to headline generation

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

All of BBN's research under the TIPSTER III program has focused on doing extraction by applying statistical models trained on annotated data, rather than by using programs that execute hand-written rules. Within the context of MUC-7, the SIFT system for extraction of template entities (TE) and template relations (TR) used a novel, integrated syntactic/semantic language model to extract sentence level information, and then synthesized information across sentences using in part a trained model for cross-sentence relations. At the named entity (NE) level as well, in both MET-1 and MUC-7, BBN employed a trained, HMM-based model.The results in these TIPSTER evaluations are evidence that such trained systems, even at their current level of development, can perform roughly on a par with those based on rules hand-tailored by experts. In addition, such trained systems have some significant advantages:• They can be easily ported to new domains by simply annotating fresh data.• The complex interactions that make rule-based systems difficult to develop and maintain can here be learned automatically from the training data.We believe that improved and extended versions of such trained models have the potential for significant further progress toward practical systems for information extraction.