SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Authors:
Steve Whittaker;Julia Hirschberg;Brian Amento;Litza Stark;Michiel Bacchiani;Philip Isenhour;Larry Stead;Gary Zamchick;Aaron Rosenberg
Affiliations:
AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ
Venue:
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Year:
2002

Citing 17
Cited 41

Expressive richness: a comparison of speech and text as media for revision

CHI '91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Voice messaging, coordination, and communication

Intellectual teamwork
Working with audio: integrating personal tape recorders and desktop computers

CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Capturing, structuring, and representing ubiquitous audio

ACM Transactions on Information Systems (TOIS)
FILOCHAT: handwritten notes provide access to recorded conversations

CHI '94 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Email overload: exploring personal information management of email

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Retrieving spoken documents by combining multiple index sources

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
“I'll get that off the audio”: a case study of salvaging multimedia meeting records

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Informedia: news-on-demand multimedia information acquisition and retrieval

Intelligent multimedia information retrieval
Play it again: a study of the factors underlying speech browsing behavior

CHI 98 Cconference Summary on Human Factors in Computing Systems
All talk and all action: strategies for managing voicemail messages

CHI 98 Cconference Summary on Human Factors in Computing Systems
SCAN: designing and evaluating user interfaces to support retrieval from speech archives

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Jotmail: a voicemail interface that enables you to see what was said

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
An interactive comic book presentation for exploring video

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
The audio notebook: paper and pen interaction with structured speech

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Four Paradigms for Indexing Video Conferences

IEEE MultiMedia
Interactively skimming recorded speech

Interactively skimming recorded speech

The Role of Speech Input in Wearable Computing

IEEE Pervasive Computing
Speech-as-data technologies for personal information devices

Personal and Ubiquitous Computing
TalkBack: a conversational answering machine

Proceedings of the 16th annual ACM symposium on User interface software and technology
Improving speech playback using time-compression and speech recognition

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Semantic speech editing

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Augmenting conversations using dual-purpose speech

Proceedings of the 17th annual ACM symposium on User interface software and technology
Next-Generation Personal Memory Aids

BT Technology Journal
ContactMap: Organizing communication in a social desktop

ACM Transactions on Computer-Human Interaction (TOCHI)
Managing availability: Supporting lightweight negotiations to handle interruptions

ACM Transactions on Computer-Human Interaction (TOCHI)
The benefits of augmenting telephone voice menu navigation with visual browsing and search

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Time is of the essence: an evaluation of temporal compression algorithms

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Error correction of voicemail transcripts in SCANMail

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Searching in audio: the utility of transcripts, dichotic presentation, and time-compression

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Navigating persistent audio

CHI '06 Extended Abstracts on Human Factors in Computing Systems
Current practice in measuring usability: Challenges to usability studies and research

International Journal of Human-Computer Studies
Automatic speech recognition for webcasts: how good is good enough and what to do when it isn't

Proceedings of the 8th international conference on Multimodal interfaces
Scenarios for embracing errorful automatic speech recognition

OZCHI '06 Proceedings of the 18th Australia conference on Computer-Human Interaction: Design: Activities, Artefacts and Environments
Software or wetware?: discovering when and why people use digital prosthetic memory

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A study of out-of-turn interaction in menu-based, IVR, voicemail systems

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Seeing what your are hearing: coordinating responses to trouble reports in network troubleshooting

ECSCW'03 Proceedings of the eighth conference on European Conference on Computer Supported Cooperative Work
Design and evaluation of systems to support interaction capture and retrieval

Personal and Ubiquitous Computing - Special Issue: User-centred design and evaluation of ubiquitous groupware
Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
An audio wiki supporting mobile collaboration

Proceedings of the 2008 ACM symposium on Applied computing
Time-Compressing Speech: ASR Transcripts Are an Effective Way to Support Gist Extraction

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Extrinsic Summarization Evaluation: A Decision Audit Task

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Resilience in the face of innovation: Household trials with BubbleBoard

International Journal of Human-Computer Studies
Supporting collaborative task management in e-mail

Human-Computer Interaction
Interactive visualisation techniques for dynamic speech transcription, correction and training

Proceedings of the 9th ACM SIGCHI New Zealand Chapter's International Conference on Human-Computer Interaction: Design Centered HCI
Have a say over what you see: evaluating interactive compression techniques

Proceedings of the 14th international conference on Intelligent user interfaces
Accessing speech documents on smartphones

Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services
Extrinsic summarization evaluation: A decision audit task

ACM Transactions on Speech and Language Processing (TSLP)
FM radio: family interplay with sonic mementos

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Analysing meeting records: an ethnographic study and technological implications

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Accessing multimodal meeting data: systems, problems and possibilities

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Markup as you talk: establishing effective memory cues while still contributing to a meeting

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Navigating multimodal meeting recordings with the meeting miner

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Autobiographical design in HCI research: designing and learning through use-it-yourself

Proceedings of the Designing Interactive Systems Conference
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval
Speech for Content Creation

International Journal of Mobile Human Computer Interaction
Treemaps to visualise and navigate speech audio

Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration

Quantified Score

Hi-index	0.01

Visualization

Abstract

Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail's support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly