Automatic extraction of grammars from annotated text

  • Authors:
  • Salim Roukos

  • Affiliations:
  • IBM T.J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • HLT '94 Proceedings of the workshop on Human Language Technology
  • Year:
  • 1994

Quantified Score

Hi-index 0.02

Visualization

Abstract

The primary objective of this project is to develop a robust, high-performance parser for English by automatically extracting a grammar from an annotated corpus of bracketed sentences, called the Treebank. The project is a collaboration between the IBM Continuous Speech Recognition Group and the University of Pennsylvania Department of Computer Sciences. Our initial focus is the domain of computer manuals with a vocabulary of 3000 words. We use a Treebank that was developed jointly by IBM and the University of Lancaster, England.