Improving speech playback using time-compression and speech recognition

Authors:
Sunil Vemuri;Philip DeCamp;Walter Bender;Chris Schmandt
Affiliations:
MIT Media Lab, Cambridge, MA;MIT Media Lab, Cambridge, MA;MIT Media Lab, Cambridge, MA;MIT Media Lab, Cambridge, MA
Venue:
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Year:
2004

Citing 8
Cited 12

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Voice communication with computers: conversational systems

Voice communication with computers: conversational systems
SpeechSkimmer: interactively skimming recorded speech

UIST '93 Proceedings of the 6th annual ACM symposium on User interface software and technology
The audio notebook: paper and pen interaction with structured speech

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A wearable digital library of personal conversations

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
MyLifeBits: fulfilling the Memex vision

Proceedings of the tenth ACM international conference on Multimedia
Every sign of life

Every sign of life

Second messenger: increasing the visibility of minority viewpoints with a face-to-face collaboration tool

Proceedings of the 9th international conference on Intelligent user interfaces
Next-Generation Personal Memory Aids

BT Technology Journal
Time is of the essence: an evaluation of temporal compression algorithms

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Error correction of voicemail transcripts in SCANMail

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Searching in audio: the utility of transcripts, dichotic presentation, and time-compression

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Accessing speech data using strategic fixation

Computer Speech and Language
On the benefits of confidence visualization in speech recognition

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Conversation clusters: grouping conversation topics through human-computer dialog

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Vote and Be Heard: Adding Back-Channel Signals to Social Mirrors

INTERACT '09 Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I
Enhanced exploration of oral history archives through processed video and synchronized text transcripts

Proceedings of the international conference on Multimedia
English “oblique” listening system – rapid listening system for the blind and visually impaired, and its evaluation

ICCHP'06 Proceedings of the 10th international conference on Computers Helping People with Special Needs
CinemaGazer: a system for watching videos at very high speed

Proceedings of the International Working Conference on Advanced Visual Interfaces

Quantified Score

Hi-index	0.01

Visualization

Abstract

Despite the ready availability of digital recording technology and the continually decreasing cost of digital storage, browsing audio recordings remains a tedious task. This paper presents evidence in support of a system designed to assist with information comprehension and retrieval tasks from a large collection of recorded speech. Two techniques are employed to assist users with these tasks. First, a speech recognizer creates necessarily error-laden transcripts of the recorded speech. Second, audio playback is time-compressed using the SOLAFS technique. When used together, subjects are able to perform comprehension tasks with more speed and accuracy.