The Possibilities of Automatic Detection/Correction of Errors in Tagged Corpora: A Pilot Study on a German Corpus

Authors:
Karel Oliva
Affiliations:
-
Venue:
TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Year:
2001

Citing 1
Cited 1

TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing

Achieving an Almost Correct PoS-Tagged Corpus

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of taggers is usually evaluated by their percentual success rate. Because of the pure quantitativity of such an approach, all errors committed by the tagger are treated on a par for the purpose of the evaluation. This paper takes a different, qualitative stand on the topic, arguing that the previous viewpoint is not linguistically adequate: the errors (might) differ in severity. General implications for tagging are discussed, and a simple method is proposed and exemplified, able to 1. detect and in some cases even rectify the most severe errors and thus 2. contribute to arriving finallyat a better tagged corpus. Some encouraging results achieved bya very simple, manually performed test and evaluation on a small sample of a corpus are given.