A probabilistic approach to grammatical analysis of written English by computer

Authors:
Andrew David Beale
Affiliations:
University of Lancaster, Bowland College, Bailrigg, Lancaster, England
Venue:
EACL '85 Proceedings of the second conference on European chapter of the Association for Computational Linguistics
Year:
1985

Citing 1
Cited 1

Choice of grammatical word-class without global syntactic analysis: tagging words in the LOB Corpus.

Computers and the Humanities

Tagging English text with a probabilistic model

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Work at the Unit for Computer Research on the English Language at the University of Lancaster has been directed towards producing a grammatically annotated version of the Lancaster-Oslo/Bergen (LOB) Corpus of written British English texts as the preliminary stage in developing computer programs and data files for providing a grammatical analysis of unrestricted English text.From 1981--83, a suite of PASCAL programs was devised to automatically produce a single level of grammatical description with one word tag representing the word class or part of speech of each word token in the corpus. Error analysis and subsequent modification to the system resulted in over 96 per cent of word tags being correctly assigned automatically. The remaining 3 to 4 per cent were corrected by human post-editors.Work is now in progress to devise a suite of programs to provide a constituent analysis of the sentences in the corpus. So far, sample sentences have been automatically assigned phrase and clause tags using a probabilistic system similar to word tagging. It is hoped that the entire corpus will eventually be parsed.