Towards history-based grammars: using richer models for probabilistic parsing
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Hi-index | 0.02 |
The primary objective of this project is to develop a robust, high-performance parser for English by automatically extracting a grammar from an annotated corpus of bracketed sentences, called the Treebank. The project is a collaboration between the IBM Continuous Speech Recognition Group and the University of Pennsylvania Department of Computer Sciences. Our initial focus is the domain of computer manuals with a vocabulary of 3000 words. We use a Treebank that was developed jointly by IBM and the University of Lancaster, England.