Active learning of extractive reference summaries for lecture speech summarization

  • Authors:
  • Justin Jian Zhang;Pascale Fung

  • Affiliations:
  • University of Science and Technology (HKUST), Hong Kong;University of Science and Technology (HKUST), Hong Kong

  • Venue:
  • BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose using active learning for tagging extractive reference summary of lecture speech. The training process of feature-based summarization model usually requires a large amount of training data with high-quality reference summaries. Human production of such summaries is tedious, and since inter-labeler agreement is low, very unreliable. Active learning helps assuage this problem by automatically selecting a small amount of unlabeled documents for humans to hand correct. Our method chooses the unlabeled documents according to the similarity score between the document and the comparable resource---PowerPoint slides. After manual correction, the selected documents are returned to the training pool. Summarization results show an increasing learning curve of ROUGE-L F-measure, from 0.44 to 0.514, consistently higher than that of using randomly chosen training samples.