A fast boosting-based learner for feature-rich tagging and chunking

  • Authors:
  • Tomoya Iwakura;Seishi Okamoto

  • Affiliations:
  • Fujitsu Laboratories Ltd., Nakahara-ku, Kawasaki, Japan;Fujitsu Laboratories Ltd., Nakahara-ku, Kawasaki, Japan

  • Venue:
  • CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Combination of features contributes to a significant improvement in accuracy on tasks such as part-of-speech (POS) tagging and text chunking, compared with using atomic features. However, selecting combination of features on learning with large-scale and feature-rich training data requires long training time. We propose a fast boosting-based algorithm for learning rules represented by combination of features. Our algorithm constructs a set of rules by repeating the process to select several rules from a small proportion of candidate rules. The candidate rules are generated from a subset of all the features with a technique similar to beam search. Then we propose POS tagging and text chunking based on our learning algorithm. Our tagger and chunker use candidate POS tags or chunk tags of each word collected from automatically tagged data. We evaluate our methods with English POS tagging and text chunking. The experimental results show that the training time of our algorithm are about 50 times faster than Support Vector Machines with polynomial kernel on the average while maintaining state-of-the-art accuracy and faster classification speed.