Mistake-driven mixture of hierarchical tag context trees

  • Authors:
  • Masahiko Haruno;Yuji Matsumoto

  • Affiliations:
  • NTT Communication Science Laboratories, Hikari-No-Oka Yokosuka-Shi, Kanagawa, Japan;NAIST, Takayama-cho Ikoma-Shi, Nara, Japan

  • Venue:
  • ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a mistake-driven mixture method for learning a tag model. The method iteratively performs two procedures: 1. constructing a tag model based on the current data distribution and 2. updating the distribution by focusing on data that are not well predicted by the constructed model. The final tag model is constructed by mixing all the models according to their performance. To well reflect the data distribution, we represent each tag model as a hierarchical tag (i.e., NTT context tree. By using the hierarchical tag context tree, the constituents of sequential tag models gradually change from broad coverage tags (e.g., noun) to specific exceptional words that cannot be captured by general tags. In other words, the method incorporates not only frequent connetions but also infrequent ones that are often considered to be collocational. We evaluate several tag models by implementing Japanese part-of-speech taggers that share all other conditions (i.e., dictionary and word model) other than their tag models. The experimental results show the proposed method significantly outperforms both hand-crafted and conventional statistical methods.