TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Achieving an Almost Correct PoS-Tagged Corpus
TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Hi-index | 0.00 |
The performance of taggers is usually evaluated by their percentual success rate. Because of the pure quantitativity of such an approach, all errors committed by the tagger are treated on a par for the purpose of the evaluation. This paper takes a different, qualitative stand on the topic, arguing that the previous viewpoint is not linguistically adequate: the errors (might) differ in severity. General implications for tagging are discussed, and a simple method is proposed and exemplified, able to 1. detect and in some cases even rectify the most severe errors and thus 2. contribute to arriving finallyat a better tagged corpus. Some encouraging results achieved bya very simple, manually performed test and evaluation on a small sample of a corpus are given.