A Comparative Study of Probabilistic Ranking Models for Chinese Spoken Document Summarization

Authors:
Shih-Hsiang Lin;Berlin Chen;Hsin-Min Wang
Affiliations:
National Taiwan Normal University;National Taiwan Normal University;Institute of Information Science, Academia Sinica
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2009

Citing 27
Cited 4

Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Generic text summarization using relevance measure and latent semantic analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A new approach to unsupervised text summarization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Text summarization via hidden Markov models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition

Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition
Language Modeling for Information Retrieval

Language Modeling for Information Retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Automatic summarization of voicemail messages using lexical and prosodic features

ACM Transactions on Speech and Language Processing (TSLP)
Exploring the use of latent topical information for statistical Chinese spoken document retrieval

Pattern Recognition Letters
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Maximum likelihood discriminant feature spaces

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
A note on Platt's probabilistic outputs for support vector machines

Machine Learning
Extractive spoken document summarization for information retrieval

Pattern Recognition Letters
Word Topic Models for Spoken Document Retrieval and Transcription

ACM Transactions on Asian Language Information Processing (TALIP)
Extractive summarization using supervised and semi-supervised learning

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Summarizing speech without text using hidden Markov models

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Speech summarization without lexical features for Mandarin broadcast news

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Document summarization using conditional random fields

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Extractive chinese spoken document summarization using probabilistic ranking models

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
A Cascaded Broadcast News Highlighter

IEEE Transactions on Audio, Speech, and Language Processing

A risk minimization framework for extractive speech summarization

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Automatic summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
A pilot study of opinion summarization in conversations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Extractive speech summarization using evaluation metric-related training criteria

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extractive document summarization automatically selects a number of indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio, and sequences them to form a concise summary. In this article, we present a comparative study of various probabilistic ranking models for spoken document summarization, including supervised classification-based summarizers and unsupervised probabilistic generative summarizers. We also investigate the use of unsupervised summarizers to improve the performance of supervised summarizers when manual labels are not available for training the latter. A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers. Encouraging initial results on Mandarin Chinese broadcast news data are demonstrated.