Saliency-maximized audio visualization and efficient audio-visual browsing for faster-than-real-time human acoustic event detection

Authors:
Kai-Hsiang Lin;Xiaodan Zhuang;Camille Goudeseune;Sarah King;Mark Hasegawa-Johnson;Thomas S. Huang
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
ACM Transactions on Applied Perception (TAP)
Year:
2013

Citing 12
Cited 0

Two-dimensional signal and image processing

Two-dimensional signal and image processing
SpeechSkimmer: a system for interactively skimming recorded speech

ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Retrieval

Information Retrieval
Digital Image Processing

Digital Image Processing
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
2006 Special Issue: Modeling attention to salient proto-objects

Neural Networks
Computational visual attention systems and their cognitive foundations: A survey

ACM Transactions on Applied Perception (TAP)
Do predictions of visual perception aid design?

ACM Transactions on Applied Perception (TAP)
Effective browsing of long audio recordings

Proceedings of the 2nd ACM international workshop on Interactive multimedia on mobile and portable devices
State-of-the-Art in Visual Attention Modeling

IEEE Transactions on Pattern Analysis and Machine Intelligence
A salience-based quality metric for visualization

EuroVis'10 Proceedings of the 12th Eurographics / IEEE - VGTC conference on Visualization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Browsing large audio archives is challenging because of the limitations of human audition and attention. However, this task becomes easier with a suitable visualization of the audio signal, such as a spectrogram transformed to make unusual audio events salient. This transformation maximizes the mutual information between an isolated event's spectrogram and an estimate of how salient the event appears in its surrounding context. When such spectrograms are computed and displayed with fluid zooming over many temporal orders of magnitude, sparse events in long audio recordings can be detected more quickly and more easily. In particular, in a 1/10-real-time acoustic event detection task, subjects who were shown saliency-maximized rather than conventional spectrograms performed significantly better. Saliency maximization also improves the mutual information between the ground truth of nonbackground sounds and visual saliency, more than other common enhancements to visualization.