SCANMail: a voicemail interface that makes speech browsable, readable and searchable

  • Authors:
  • Steve Whittaker;Julia Hirschberg;Brian Amento;Litza Stark;Michiel Bacchiani;Philip Isenhour;Larry Stead;Gary Zamchick;Aaron Rosenberg

  • Affiliations:
  • AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ

  • Venue:
  • Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail's support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly