Linguistic indeterminacy as a source of errors in tagging

Authors:
Gunnel Källgren
Affiliations:
Stockholm University, Stockholm, Sweden
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Year:
1996

Citing 6
Cited 0

Grammatical category disambiguation by statistical optimization

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Specifying a shallow grammatical representation for parsing purposes

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
"The first million is hardest to get": building a large tagged corpus as automatically as possible

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most evaluations of part-of-speech tagging compare the utput of an automatic tagger to some established standard, define the differences as tagging errors and try to remedy them by, e.g., more training of the tagger. The present article is based on a manual analysis of a large number of tagging errors. Some clear patterns among the errors can be discerned, and the sources of the errors as well as possible alternative methods of remedy are presented and discussed. In particular are the problems with undecidable cases treated.