Choice of grammatical word-class without global syntactic analysis: tagging words in the LOB Corpus.
Computers and the Humanities
Grammatical analysis by computer of the Lancaster-Oslo/Bergen (LOB) corpus of British English texts
ACL '85 Proceedings of the 23rd annual meeting on Association for Computational Linguistics
Tagging English text with a probabilistic model
Computational Linguistics
Automatic learning for semantic collocation
ANLC '92 Proceedings of the third conference on Applied natural language processing
Extracting noun phrases from large-scale texts: a hybrid approach and its automatic evaluation
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Hi-index | 0.00 |
The UCREL team at the University of Lancaster is engaged in the development of a robust parsing mechanism, which will assign the appropriate grammatical structure to sentences in unconstrained English text. The techniques used involve the calculation of probabilities for competing structures, and are based on the techniques successfully used in tagging (i.e. assigning grammatical word classes) to the LOB (Lancaster-Oslo/Bergen) corpus.The first step in the parsing process involves dictionary lookup of successive pairs of grammatically tagged words, to give a number of possible continuations to the current parse. Since this lookup will often not be able unambiguously to distinguish the point at which a grammatical constituent should be closed, the second step of the parsing process will have to insert closures and distinguish between alternative parses. It will generate trees representing these possible alternatives, insert closure points for the constituents, and compute a probability for each parse tree from the probability of each constituent within the tree. It will then be able to select a preferred parse or parses for output.The probability of a grammatical constituent is derived from a bank of manually parsed sentences.