Learning dictionaries of stable autoregressive models for audio scene analysis

Authors:
Youngmin Cho;Lawrence K. Saul
Affiliations:
University of California, San Diego, La Jolla, CA;University of California, San Diego, La Jolla, CA
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 5
Cited 1

Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Large-scale content-based audio retrieval from text queries

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
A Sparse Decomposition Method for Periodic Signal Mixtures

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Sparse decomposition of mixed audio signals by basis pursuit with autoregressive models

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Fused Lasso and rotation invariant autoregressive models for texture classification

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we explore an application of basis pursuit to audio scene analysis. The goal of our work is to detect when certain sounds are present in a mixed audio signal. We focus on the regime where out of a large number of possible sources, a small but unknown number combine and overlap to yield the observed signal. To infer which sounds are present, we decompose the observed signal as a linear combination of a small number of active sources. We cast the inference as a regularized form of linear regression whose sparse solutions yield decompositions with few active sources. We characterize the acoustic variability of individual sources by autoregressive models of their time domain waveforms. When we do not have prior knowledge of the individual sources, the coefficients of these autoregressive models must be learned from audio examples. We analyze the dynamical stability of these models and show how to estimate stable models by substituting a simple convex optimization for a difficult eigenvalue problem. We demonstrate our approach by learning dictionaries of musical notes and using these dictionaries to analyze polyphonic recordings of piano, cello, and violin.