Extrinsic summarization evaluation: A decision audit task

Authors:
Gabriel Murray;Thomas Kleinbauer;Peter Poller;Tilman Becker;Steve Renals;Jonathan Kilgour
Affiliations:
University of British Columbia;German Research Center for Artificial Intelligence (DFKI);German Research Center for Artificial Intelligence (DFKI);German Research Center for Artificial Intelligence (DFKI);University of Edinburgh;University of Edinburgh
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2009

Citing 23
Cited 3

The identification of important concepts in highly structured technical papers

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
SpeechSkimmer: a system for interactively skimming recorded speech

ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data
SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Evaluating Natural Language Processing Systems: An Analysis and Review

Evaluating Natural Language Processing Systems: An Analysis and Review
Automatic summarization of open-domain multiparty dialogues in diverse genres

Computational Linguistics - Summarization
Generating indicative-informative summaries with sumUM

Computational Linguistics - Summarization
Minimizing word error rate in textual summaries of spoken language

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
The TIPSTER SUMMAC Text Summarization Evaluation

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Deep Read: a reading comprehension system

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A meeting browser evaluation test

CHI '05 Extended Abstracts on Human Factors in Computing Systems
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Automatic summarization of voicemail messages using lexical and prosodic features

ACM Transactions on Speech and Language Processing (TSLP)
Incorporating speaker and discourse features into speech summarization

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
Automatic summarization of English broadcast news speech

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Design and evaluation of systems to support interaction capture and retrieval

Personal and Ubiquitous Computing - Special Issue: User-centred design and evaluation of ubiquitous groupware
A skip-chain conditional random field for ranking meeting utterances by importance

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Combining multiple information layers for the automatic generation of indicative meeting abstracts

ENLG '07 Proceedings of the Eleventh European Workshop on Natural Language Generation
Term-weighting for summarization of multi-party spoken dialogues

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
The AMI meeting corpus: a pre-announcement

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Accessing multimodal meeting data: systems, problems and possibilities

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Browsing recorded meetings with ferret

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction

Recognition and understanding of meetings

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Generating and validating abstracts of meeting conversations: a user study

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Automatic summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we describe a large-scale extrinsic evaluation of automatic speech summarization technologies for meeting speech. The particular task is a decision audit, wherein a user must satisfy a complex information need, navigating several meetings in order to gain an understanding of how and why a given decision was made. We compare the usefulness of extractive and abstractive technologies in satisfying this information need, and assess the impact of automatic speech recognition (ASR) errors on user performance. We employ several evaluation methods for participant performance, including post-questionnaire data, human subjective and objective judgments, and a detailed analysis of participant browsing behavior. We find that while ASR errors affect user satisfaction on an information retrieval task, users can adapt their browsing behavior to complete the task satisfactorily. Results also indicate that users consider extractive summaries to be intuitive and useful tools for browsing multimodal meeting data. We discuss areas in which automatic summarization techniques can be improved in comparison with gold-standard meeting abstracts.