Semisupervised condensed nearest neighbor for part-of-speech tagging

  • Authors:
  • Anders Søgaard

  • Affiliations:
  • University of Copenhagen, Njalsgade, Copenhagen S

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a new training set condensation technique designed for mixtures of labeled and unlabeled data. It finds a condensed set of labeled and unlabeled data points, typically smaller than what is obtained using condensed nearest neighbor on the labeled data only, and improves classification accuracy. We evaluate the algorithm on semi-supervised part-of-speech tagging and present the best published result on the Wall Street Journal data set.