Active learning of extractive reference summaries for lecture speech summarization

Authors:
Justin Jian Zhang;Pascale Fung
Affiliations:
University of Science and Technology (HKUST), Hong Kong;University of Science and Technology (HKUST), Hong Kong
Venue:
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Year:
2009

Citing 6
Cited 1

Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
SlideSeer: a digital library of aligned document and presentation pairs

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Extractive summarization of broadcast news: comparing strategies for European portuguese

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue

Active learning with semi-automatic annotation for extractive speech summarization

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose using active learning for tagging extractive reference summary of lecture speech. The training process of feature-based summarization model usually requires a large amount of training data with high-quality reference summaries. Human production of such summaries is tedious, and since inter-labeler agreement is low, very unreliable. Active learning helps assuage this problem by automatically selecting a small amount of unlabeled documents for humans to hand correct. Our method chooses the unlabeled documents according to the similarity score between the document and the comparable resource---PowerPoint slides. After manual correction, the selected documents are returned to the training pool. Summarization results show an increasing learning curve of ROUGE-L F-measure, from 0.44 to 0.514, consistently higher than that of using randomly chosen training samples.