Active learning with semi-automatic annotation for extractive speech summarization

Authors:
Justin Jian Zhang;Pascale Fung
Affiliations:
Dongguan University of Technology, Dongguan, China and The Hong Kong University of Science and Technology;The Hong Kong University of Science and Technology
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2012

Citing 36
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The identification of important concepts in highly structured technical papers

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Support-Vector Networks

Machine Learning
Relevance: the whole history

Journal of the American Society for Information Science - Special topic issue on the history of documentation and information science: part II
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Relevance: a review of and a framework for the thinking on the notion in information science

Readings in information retrieval
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Automatic abstracting and indexing—survey and recommendations

Communications of the ACM
Communication and prosody: functional aspects of prosody

Speech Communication - Dialogue and prosody
Summarizing scientific articles: experiments with relevance and rhetorical status

Computational Linguistics - Summarization
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Active learning via transductive experimental design

ICML '06 Proceedings of the 23rd international conference on Machine learning
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
SlideSeer: a digital library of aligned document and presentation pairs

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Advances in confidence measures for large vocabulary

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Evaluation of active learning strategies for video indexing

Image Communication
Correlation between ROUGE and human evaluation of extractive meeting summaries

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
An analysis of active learning strategies for sequence labeling tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Summarizing speech without text using hidden Markov models

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Speech summarization without lexical features for Mandarin broadcast news

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Optimistic active learning using mutual information

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Active learning of extractive reference summaries for lecture speech summarization

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Extractive summarization of broadcast news: comparing strategies for European portuguese

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
ChiMerge: discretization of numeric attributes

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
A new approach to automatic speech summarization

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose using active learning for extractive speech summarization in order to reduce human effort in generating reference summaries. Active learning chooses a selective set of samples to be labeled. We propose a combination of informativeness and representativeness criteria for selection. We further propose a semi-automatic method to generate reference summaries for presentation speech by using Relaxed Dynamic Time Warping (RDTW) alignment between presentation speech and its accompanied slides. Our summarization results show that the amount of labeled data needed for a given summarization accuracy can be reduced by more than 23% compared to random sampling.