Automatic extraction of grammars from annotated text

Authors:
Salim Roukos
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
HLT '94 Proceedings of the workshop on Human Language Technology
Year:
1994

Citing 1
Cited 0

Towards history-based grammars: using richer models for probabilistic parsing

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.02

Visualization

Abstract

The primary objective of this project is to develop a robust, high-performance parser for English by automatically extracting a grammar from an annotated corpus of bracketed sentences, called the Treebank. The project is a collaboration between the IBM Continuous Speech Recognition Group and the University of Pennsylvania Department of Computer Sciences. Our initial focus is the domain of computer manuals with a vocabulary of 3000 words. We use a Treebank that was developed jointly by IBM and the University of Lancaster, England.