Segmentation, indexing, and retrieval for environmental and natural sounds

Authors:
Gordon Wichern;Jiachen Xue;Harvey Thornburg;Brandon Mechtley;Andreas Spanias
Affiliations:
School of Arts, Media and Engineering, Arizona State University, Tempe, AZ;School of Arts, Media and Engineering, Arizona State University, Tempe, AZ;School of Arts, Media and Engineering, Arizona State University, Tempe, AZ;School of Arts, Media and Engineering, Arizona State University, Tempe, AZ;School of Arts, Media, and Engineering, Arizona State University, Tempe, AZ
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 17
Cited 6

Computational auditory scene analysis

Computational auditory scene analysis
Information Retrieval

Information Retrieval
HMM-based musical query retrieval

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Robust temporal and spectral modeling for query By melody

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Content-Based Classification, Search, and Retrieval of Audio

IEEE MultiMedia
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Minimal-impact audio-based personal archives

Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences
ConceptNet — A Practical Commonsense Reasoning Tool-Kit

BT Technology Journal
MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval

MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval
MyLifeBits: a personal database for everything

Communications of the ACM - Personal information management
Detection and modeling of transient audio signals with prior information

Detection and modeling of transient audio signals with prior information
Accessing Minimal-Impact Personal Audio Archives

IEEE MultiMedia
Multi-channel audio segmentation for continuous observation and archival of large spaces

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A new probabilistic spectral pitch estimator: exact and MCMC-approximate strategies

CMMR'04 Proceedings of the Second international conference on Computer Music Modeling and Retrieval
Melody Extraction and Musical Onset Detection via Probabilistic Models of Framewise STFT Peak Data

IEEE Transactions on Audio, Speech, and Language Processing
Modeling individual and group actions in meetings with layered HMMs

IEEE Transactions on Multimedia
Automatic Meeting Segmentation Using Dynamic Bayesian Networks

IEEE Transactions on Multimedia

An ontological framework for retrieving environmental sounds using semantics and acoustic content

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on environmental sound synthesis, processing, and retrieval
Optimizing multimedia retrieval using multimodal fusion and relevance feedback techniques

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Dynamic and scalable audio classification by collective network of binary classifiers framework: An evolutionary approach

Neural Networks
Environmental sound recognition by measuring significant changes in the spectral entropy

MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
Multimedia search and retrieval using multimodal annotation propagation and indexing techniques

Image Communication
A unified framework for multimodal retrieval

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a method for characterizing sound activity in fixed spaces through segmentation, indexing, and retrieval of continuous audio recordings. Regarding segmentation, we present a dynamic Bayesian network (DBN) that jointly infers onsets and end times of the most prominent sound events in the space, along with an extension of the algorithm for covering large spaces with distributed microphone arrays. Each segmented sound event is indexed with a hidden Markov model (HMM) that models the distribution of example-based queries that a user would employ to retrieve the event (or similar events). In order to increase the efficiency of the retrieval search, we recursively apply a modified spectral clustering algorithm to group similar sound events based on the distance between their corresponding HMMs. We then conduct a formal user study to obtain the relevancy decisions necessary for evaluation of our retrieval algorithm on both automatically and manually segmented sound clips. Furthermore, our segmentation and retrieval algorithms are shown to be effective in both quiet indoor and noisy outdoor recording conditions.