Learning condensed feature representations from large unsupervised data sets for supervised learning

  • Authors:
  • Jun Suzuki;Hideki Isozaki;Masaaki Nagata

  • Affiliations:
  • NTT Communication Science Laboratories, NTT Corp., Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan;NTT Communication Science Laboratories, NTT Corp., Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan;NTT Communication Science Laboratories, NTT Corp., Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a novel approach for effectively utilizing unsupervised data in addition to supervised data for supervised learning. We use unsupervised data to generate informative 'condensed feature representations' from the original feature set used in supervised NLP systems. The main contribution of our method is that it can offer dense and low-dimensional feature spaces for NLP tasks while maintaining the state-of-the-art performance provided by the recently developed high-performance semi-supervised learning technique. Our method matches the results of current state-of-the-art systems with very few features, i.e., F-score 90.72 with 344 features for CoNLL-2003 NER data, and UAS 93.55 with 12.5K features for dependency parsing data derived from PTB-III.