Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Multi-criteria-based active learning for named entity recognition
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation
ACM Transactions on Speech and Language Processing (TSLP)
SlideSeer: a digital library of aligned document and presentation pairs
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Extractive summarization of broadcast news: comparing strategies for European portuguese
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Active learning with semi-automatic annotation for extractive speech summarization
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
We propose using active learning for tagging extractive reference summary of lecture speech. The training process of feature-based summarization model usually requires a large amount of training data with high-quality reference summaries. Human production of such summaries is tedious, and since inter-labeler agreement is low, very unreliable. Active learning helps assuage this problem by automatically selecting a small amount of unlabeled documents for humans to hand correct. Our method chooses the unlabeled documents according to the similarity score between the document and the comparable resource---PowerPoint slides. After manual correction, the selected documents are returned to the training pool. Summarization results show an increasing learning curve of ROUGE-L F-measure, from 0.44 to 0.514, consistently higher than that of using randomly chosen training samples.