Performance analysis of a part of speech tagging task

Authors:
Rada Mihalcea
Affiliations:
University of North Texas, Computer Science Department, Denton, TX
Venue:
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Year:
2003

Citing 10
Cited 3

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Error reduction through learning multiple descriptions

Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Combining Classifiers for word sense disambiguation

Natural Language Engineering
Analyses for elucidating current question answering technology

Natural Language Engineering
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Classifier combination for improved lexical disambiguation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving data driven wordclass tagging by system combination

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Estimating upper and lower bounds on the performance of word-sense disambiguation programs

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Combining heterogeneous classifiers for word-sense disambiguation

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8

Application of stacked methods to part-of-speech tagging of polish

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Developing a competitive HMM arabic POS tagger using small training corpora

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
The UPF learner translation corpus as a resource for translator training

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we attempt to make a formal analysis of the performance in automatic part of speech tagging. Lower and upper bounds in tagging precision using existing taggers or their combination are provided. Since we show that with existing taggers, automatic perfect tagging is not possible, we offer two solutions for applications requiring very high precision: (1) a solution involving minimum human intervention for a precision of over 98.7%, and (2) a combination of taggers using a memory based learning algorithm that succeeds in reducing the error rate with 11.6% with respect to the best tagger involved.