Accessing speech data using strategic fixation

Authors:
Steve Whittaker;Julia Hirschberg
Affiliations:
Department of Information Studies, University of Sheffield, 211 Portobello Street, Sheffield S1 4DP, UK;Department of Computer Science, Columbia University, 1214 Amsterdam Avenue, M/C 0401, 450 CS Building, New York, NY 10027, USA
Venue:
Computer Speech and Language
Year:
2007

Citing 37
Cited 1

Reading continuous text from a one-line visual display

International Journal of Man-Machine Studies
Expressive richness: a comparison of speech and text as media for revision

CHI '91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Tools for building asynchronous servers to support speech and audio applications

UIST '92 Proceedings of the 5th annual ACM symposium on User interface software and technology
Working with audio: integrating personal tape recorders and desktop computers

CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Capturing, structuring, and representing ubiquitous audio

ACM Transactions on Information Systems (TOIS)
Informal workplace communication: what is it like and how might we support it?

CHI '94 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
FILOCHAT: handwritten notes provide access to recorded conversations

CHI '94 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Finding and reminding: file organization from the desktop

ACM SIGCHI Bulletin
Retrieving spoken documents by combining multiple index sources

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
SpeechSkimmer: a system for interactively skimming recorded speech

ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data
Augmenting real-world objects: a paper-based audio notebook

Conference Companion on Human Factors in Computing Systems
Dynomite: a dynamically organized ink and audio notebook

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
“I'll get that off the audio”: a case study of salvaging multimedia meeting records

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Informedia: news-on-demand multimedia information acquisition and retrieval

Intelligent multimedia information retrieval
Play it again: a study of the factors underlying speech browsing behavior

CHI 98 Cconference Summary on Human Factors in Computing Systems
All talk and all action: strategies for managing voicemail messages

CHI 98 Cconference Summary on Human Factors in Computing Systems
SCAN: designing and evaluating user interfaces to support retrieval from speech archives

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Auto-summarization of audio-video presentations

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Jotmail: a voicemail interface that enables you to see what was said

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
An interactive comic book presentation for exploring video

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
The audio notebook: paper and pen interaction with structured speech

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Usability engineering: scenario-based development of human-computer interaction

Usability engineering: scenario-based development of human-computer interaction
Automated message prioritization: making voicemail retrieval more efficient

CHI '02 Extended Abstracts on Human Factors in Computing Systems
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Four Paradigms for Indexing Video Conferences

IEEE MultiMedia
Improving speech playback using time-compression and speech recognition

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Semantic speech editing

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Synthetic news radio

IBM Systems Journal
A meeting browser evaluation test

CHI '05 Extended Abstracts on Human Factors in Computing Systems
TAP-XL: an automated analyst's assistant

NAACL-Demonstrations '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations - Volume 4
Information extraction from voicemail transcripts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Identifying agreement and disagreement in conversational speech: use of Bayesian networks to model pragmatic dependencies

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Seeing what your are hearing: coordinating responses to trouble reports in network troubleshooting

ECSCW'03 Proceedings of the eighth conference on European Conference on Computer Supported Cooperative Work
Let's stop pushing the envelope and start addressing it: a reference task agenda for HCI

Human-Computer Interaction
An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email

Journal of Artificial Intelligence Research
Browsing recorded meetings with ferret

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
A new approach to automatic speech summarization

IEEE Transactions on Multimedia

Accessing speech documents on smartphones

Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and transcript-centric access that address the problem of speech access by supporting strategic fixation. Indexing involves users constructing external visual indices into speech. Users visually scan these indices to find information-rich regions of speech for more detailed processing and playback. Transcription involves transcribing speech using automatic speech recognition (ASR) and enriching that transcription with visual cues. The resulting enriched transcript is time-aligned to the original speech, allowing users to scan the transcript as a whole or the additional visual cues present in the transcript, to fixate and play regions of interest. We tested the effectiveness of these two approaches on a set of reference tasks derived from observations of current voicemail practice. A field trial evaluation of JotMail, an indexed-based interface similar to commercial unified messaging clients, showed that our approaches were effective in supporting speech scanning, information extraction and status tracking, but not archive management. However, users found it onerous to take manual notes with JotMail to provide effective retrieval indices. We therefore built SCANMail, a transcript-based interface that constructs indices automatically, using ASR to generate a transcript of the speech data. SCANMail also uses information extraction techniques to identify regions of potential interest, e.g. telephone numbers, within the transcript. Laboratory and field trials showed that SCANMail overcame most of the problems users reported with JotMail, supporting scanning, information extraction and archiving. Importantly, our evaluations showed that, despite errors, ASR transcripts provide a highly effective tool for browsing. Users exploited the enriched transcript to determine the gist of the underlying speech, and as a guide to identifying areas of speech that it was critical for them to play. Long-term field trials also showed the utility of transcripts to support notification and mobile access.