Active Zipfian sampling for statistical parser training

  • Authors:
  • Onur Çobanoǧlu

  • Affiliations:
  • University of Pittsburgh, Pittsburgh, PA

  • Venue:
  • NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Active learning has proven to be a successful strategy in quick development of corpora to be used in training of statistical natural language parsers. A vast majority of studies in this field has focused on estimating informativeness of samples; however, representativeness of samples is another important criterion to be considered in active learning. We present a novel metric for estimating representativeness of sentences, based on a modification of Zipf's Principle of Least Effort. Experiments on WSJ corpus with a wide-coverage parser show that our method performs always at least as good as and generally significantly better than alternative representativeness-based methods.