SpeechSkimmer: a system for interactively skimming recorded speech

Authors:
Barry Arons
Affiliations:
Massachusetts Institute of Technology, Cambridge
Venue:
ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data
Year:
1997

Citing 35
Cited 66

Attention, intentions, and the structure of discourse

Computational Linguistics
Generalized fisheye views

CHI '86 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A robust algorithm for accurate endpointing of speech signals

Speech Communication
Envisioning information

Envisioning information
Heuristic evaluation of user interfaces

CHI '90 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
User interface evaluation in the real world: a comparison of four techniques

CHI '91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The perspective wall: detail and context smoothly integrated

CHI '91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Techniques for information retrieval from speech messages

The Lincoln Laboratory Journal
Hyperspeech: navigating in speech-only hypermedia

HYPERTEXT '91 Proceedings of the third annual ACM conference on Hypertext
A morphological analysis of the design space of input devices

ACM Transactions on Information Systems (TOIS) - Special issue on computer—human interaction
A system for retrieving speech documents

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Tools for building asynchronous servers to support speech and audio applications

UIST '92 Proceedings of the 5th annual ACM symposium on User interface software and technology
A magnifier tool for video data

CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Finding usability problems through heuristic evaluation

CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Skip and scan: cleaning up telephone interface

CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Wordspotting for voice editing and audio indexing

CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Analog input device physical characteristics

ACM SIGCHI Bulletin
Capturing, structuring, and representing ubiquitous audio

ACM Transactions on Information Systems (TOIS)
VoiceNotes: a speech interface for a hand-held voice notetaker

CHI '93 Proceedings of the INTERACT '93 and CHI '93 Conference on Human Factors in Computing Systems
FILOCHAT: handwritten notes provide access to recorded conversations

CHI '94 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Structure out of sound

Structure out of sound
Mathtalk: the design of an interface for reading algebra using speech

ICCHP '94 Proceedings of the 4th international conference on Computers for handicapped persons
Media Streams: an iconic visual language for video representation

Human-computer interaction
Interactively skimming recorded speech

Interactively skimming recorded speech
Audio system for technical readings

Audio system for technical readings
Speaker segmentation for browsing recorded audio

CHI '95 Conference Companion on Human Factors in Computing Systems
Evolutionary engagement in an ongoing collaborative work process: a case study

CSCW '96 Proceedings of the 1996 ACM conference on Computer supported cooperative work
Automatic audio content analysis

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Open-vocabulary speech indexing for voice and video mail retrieval

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Augmenting real-world objects: a paper-based audio notebook

Conference Companion on Human Factors in Computing Systems
The importance of percent-done progress indicators for computer-human interfaces

CHI '85 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Usability Engineering

Usability Engineering
Is Usability Engineering Really Worth It?

IEEE Software
The intonational structuring of discourse

ACL '86 Proceedings of the 24th annual meeting on Association for Computational Linguistics
Intonational features of local and global discourse structure

HLT '91 Proceedings of the workshop on Speech and Natural Language

Auditory navigation in hyperspace: design and evaluation of a non-visual hypermedia system for blind users

Assets '98 Proceedings of the third international ACM conference on Assistive technologies
Digital talking books on a PC: a usability evaluation of the prototype DAISY playback software

Assets '98 Proceedings of the third international ACM conference on Assistive technologies
A dynamic grouping technique for ink and audio notes

Proceedings of the 11th annual ACM symposium on User interface software and technology
An intelligent media browser using automatic multimodal analysis

MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
An overview of audio information retrieval

Multimedia Systems - Special issue on audio and multimedia
Time-compression: systems concerns, usage, and benefits

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Towards robust features for classifying audio in the CueVideo system

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Auto-summarization of audio-video presentations

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Browsing digital video

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Comparing presentation summaries: slides vs. reading vs. listening

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Multimedia information retrieval from recorded presentations (poster session)

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Suede: a Wizard of Oz prototyping tool for speech user interfaces

UIST '00 Proceedings of the 13th annual ACM symposium on User interface software and technology
Nomadic radio: speech and audio interaction for contextual messaging in nomadic environments

ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on human-computer interaction with mobile systems
Designing presentations for on-demand viewing

CSCW '00 Proceedings of the 2000 ACM conference on Computer supported cooperative work
The audio notebook: paper and pen interaction with structured speech

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Transcript-free search of audio archives for the national gallery of the spoken word

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Mediated voice communication via mobile IP

Proceedings of the 15th annual ACM symposium on User interface software and technology
The "Authoring on the Fly" system for automatic presentation recording

CHI '01 Extended Abstracts on Human Factors in Computing Systems
Portable meeting recorder

Proceedings of the tenth ACM international conference on Multimedia
OntoLog: Temporal Annotation Using Ad Hoc Ontologies and Application Profiles

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
MyInfo: a personal news interface

CHI '03 Extended Abstracts on Human Factors in Computing Systems
Network-based interaction

The human-computer interaction handbook
MARSYAS: a framework for audio analysis

Organised Sound
MARSYAS: a framework for audio analysis

Organised Sound
RAW: conveying minimally-mediated impressions of everyday life with an audio-photographic tool

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Semantic speech editing

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Temporal Thumbnails: rapid visualization of time-based viewing data

Proceedings of the working conference on Advanced visual interfaces
Calculation of an aggregated level of interest function for recorded events

Proceedings of the 12th annual ACM international conference on Multimedia
Interactive manipulation of replay speed while listening to speech recordings

Proceedings of the 12th annual ACM international conference on Multimedia
10× — Human-Machine Symbiosis

BT Technology Journal
Supporting UI design by sketch and speech recognition

TAMODIA '04 Proceedings of the 3rd annual conference on Task models and diagrams
Mixxx: towards novel DJ interfaces

NIME '03 Proceedings of the 2003 conference on New interfaces for musical expression
Documenting the pen-based interaction

WebMedia '05 Proceedings of the 11th Brazilian Symposium on Multimedia and the web
Time is of the essence: an evaluation of temporal compression algorithms

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Error correction of voicemail transcripts in SCANMail

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Searching in audio: the utility of transcripts, dichotic presentation, and time-compression

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Navigating persistent audio

CHI '06 Extended Abstracts on Human Factors in Computing Systems
One-sided measures for evaluating ranked retrieval effectiveness with spontaneous conversational speech

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
DiMaß:: a technique for audio scrubbing and skimming using direct manipulation

Proceedings of the 1st ACM workshop on Audio and music computing multimedia
Accessing speech data using strategic fixation

Computer Speech and Language
Earpod: eyes-free menu selection using touch input and reactive audio feedback

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Towards a quantitative analysis of audio scrolling interfaces

CHI '07 Extended Abstracts on Human Factors in Computing Systems
Analysis of navigability of Web applications for improving blind usability

ACM Transactions on Computer-Human Interaction (TOCHI)
HSTP: hyperspeech transfer protocol

Proceedings of the eighteenth conference on Hypertext and hypermedia
Audio-based methods for navigating and browsing educational multimedia documents

Proceedings of the international workshop on Educational multimedia and multimedia education
Design and evaluation of systems to support interaction capture and retrieval

Personal and Ubiquitous Computing - Special Issue: User-centred design and evaluation of ubiquitous groupware
Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Evaluating audio skimming and frame rate acceleration for summarizing BBC rushes

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Extrinsic summarization evaluation: A decision audit task

ACM Transactions on Speech and Language Processing (TSLP)
Designing and evaluating voice-based virtual communities

CHI '10 Extended Abstracts on Human Factors in Computing Systems
What are the most eye-catching and ear-catching features in the video?: implications for video summarization

Proceedings of the 19th international conference on World wide web
The spoken web: a web for the underprivileged

ACM SIGWEB Newsletter
User driven audio content navigation for spoken web

Proceedings of the international conference on Multimedia
Understanding how webcasts are used as sources of information

Journal of the American Society for Information Science and Technology
Searching for music: how feedback and input-control change the way we search

INTERACT'05 Proceedings of the 2005 IFIP TC13 international conference on Human-Computer Interaction
Accessing multimodal meeting data: systems, problems and possibilities

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Task switching in audio based systems

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Navigating multimodal meeting recordings with the meeting miner

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Participants' personal note-taking in meetings and its value for automatic meeting summarisation

Information Technology and Management
“Spindex” (Speech Index) Enhances Menus on Touch Screen Devices with Tapping, Wheeling, and Flicking

ACM Transactions on Computer-Human Interaction (TOCHI)
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval
Effective browsing of long audio recordings

Proceedings of the 2nd ACM international workshop on Interactive multimedia on mobile and portable devices
Advanced Mobile Lecture Viewing: Summarization and Two-Way Navigation

International Journal of Handheld Computing Research
Methodological triangulation of the students' use of recorded lectures

International Journal of Learning Technology
Saliency-maximized audio visualization and efficient audio-visual browsing for faster-than-real-time human acoustic event detection

ACM Transactions on Applied Perception (TAP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Listening to a speech recording is much more difficult than visually scanning a document because of the transient and temporal nature of audio. Audio recordings capture the richness of speech, yet it is difficult to directly browse the stored information. This article describes techniques for structuring, filtering, and presenting recorded speech, allowing a user to navigate and interactively find information in the audio domain. This article describes the SpeechSkimmer system for interactively skimming speech recordings. SpeechSkimmer uses speech-processing techniques to allow a user to hear recorded sounds quickly, and at several levels of detail. User interaction, through a manual input device, provides continuous real-time control of the speed and detail level of the audio presentation. SpeechSkimmer reduces the time needed to listen by incorporating time-compressed speech, pause shortening, automatic emphasis detection, and nonspeech audio feedback. This article also presents a multilevel structural approach to auditory skimming and user interface techniques for interacting with recorded speech. An observational usability test of SpeechSkimmer is discussed, as well as a redesign and reimplementation of the user interface based on the results of this usability test.