Mixed-source multi-document speech-to-text summarization

Authors:
Ricardo Ribeiro;David Martins de Matos
Affiliations:
INESC ID Lisboa/ISCTE/IST, Spoken Language Systems Lab, Lisboa, Portugal;INESC ID Lisboa/IST, Spoken Language Systems Lab, Lisboa, Portugal
Venue:
MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
Year:
2008

Citing 13
Cited 3

Generating summaries of multiple news articles

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Generic text summarization using relevance measure and latent semantic analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Finite-state transducers in language and speech processing

Computational Linguistics
Minimizing word error rate in textual summaries of spoken language

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
NewsInEssence: summarizing online news topics

Communications of the ACM - The digital society
Incorporating speaker and discourse features into speech summarization

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CollabSum: exploiting multiple document clustering for collaborative single document summarizations

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic summarising: The state of the art

Information Processing and Management: an International Journal
Tracking and summarizing news on a daily basis with Columbia's Newsblaster

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Extractive summarization of broadcast news: comparing strategies for European portuguese

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Recent advances in automatic speech summarization

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)

Revisiting centrality-as-relevance: support sets and similarity as geometric proximity

Journal of Artificial Intelligence Research
Summarizing speech by contextual reinforcement of important passages

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Revisiting centrality-as-relevance: support sets and similarity as geometric proximity: extended abstract

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech-to-text summarization systems usually take as input the output of an automatic speech recognition (ASR) system that is affected by issues like speech recognition errors, disfluencies, or difficulties in the accurate identification of sentence boundaries. We propose the inclusion of related, solid background information to cope with the difficulties of summarizing spoken language and the use of multi-document summarization techniques in single document speech-to-text summarization. In this work, we explore the possibilities offered by phonetic information to select the background information and conduct a perceptual evaluation to better assess the relevance of the inclusion of that information. Results show that summaries generated using this approach are considerably better than those produced by an up-to-date latent semantic analysis (LSA) summarization method and suggest that humans prefer summaries restricted to the information conveyed in the input source.