Automatic refinement of a POS tagger using a reliable parser and plain text corpora

Authors:
Hideki Hirakawa;Kenji Ono;Yumiko Yoshimura
Affiliations:
Corporate Research & Development Center, Toshiba Corporation, Kawasaki, Japan;Corporate Research & Development Center, Toshiba Corporation, Kawasaki, Japan;Corporate Research & Development Center, Toshiba Corporation, Kawasaki, Japan
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 5
Cited 5

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Mistake-driven mixture of hierarchical tag context trees

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised learning of word-category guessing rules

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics

Achieving an Almost Correct PoS-Tagged Corpus

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Detecting errors in part-of-speech annotation

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
(Semi-)automatic detection of errors in PoS-tagged corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Annotating topological fields and chunks: and revising POS tags at the same time

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Schema and variation: digitizing printed dictionaries

ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new unsupervised learning method for obtaining English part-of-speech (POS) disambiguation rules which would improve the accuracy of a POS tagger. This method has been implemented in the experimental system APRAS (Automatic POS Rule Acquisition System), which extracts POS disambiguation rules from plain text corpora by utilizing different types of coded linguistic knowledge, i.e., POS tagging rules and syntactic parsing rules, which are already stored in a fully implemented MT system.In our experiment, the obtained rules were applied to 1.7% of the sentences in a non-training corpus. For this group of sentences, 78.4% of the changes made in tagging results were an improvement. We also saw a 15.5% improvement in tagging and parsing speed and an 8.0% increase of parsable sentences.