Automatic refinement of a POS tagger using a reliable parser and plain text corpora

  • Authors:
  • Hideki Hirakawa;Kenji Ono;Yumiko Yoshimura

  • Affiliations:
  • Corporate Research & Development Center, Toshiba Corporation, Kawasaki, Japan;Corporate Research & Development Center, Toshiba Corporation, Kawasaki, Japan;Corporate Research & Development Center, Toshiba Corporation, Kawasaki, Japan

  • Venue:
  • COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a new unsupervised learning method for obtaining English part-of-speech (POS) disambiguation rules which would improve the accuracy of a POS tagger. This method has been implemented in the experimental system APRAS (Automatic POS Rule Acquisition System), which extracts POS disambiguation rules from plain text corpora by utilizing different types of coded linguistic knowledge, i.e., POS tagging rules and syntactic parsing rules, which are already stored in a fully implemented MT system.In our experiment, the obtained rules were applied to 1.7% of the sentences in a non-training corpus. For this group of sentences, 78.4% of the changes made in tagging results were an improvement. We also saw a 15.5% improvement in tagging and parsing speed and an 8.0% increase of parsable sentences.