Representing musical sounds with an interpolating state model

Authors:
Anssi Klapuri;Tuomas Virtanen
Affiliations:
Department of Electronic Engineering, Queen Mary, University of London, London, UK and Department of Signal Processing, Tampere University of Technology, Tampere, Finland;Department of Signal Processing, Tampere University of Technology, Tampere, Finland
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 11
Cited 0

Fundamentals of statistical signal processing: estimation theory

Fundamentals of statistical signal processing: estimation theory
The computer music tutorial

The computer music tutorial
Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Statistical Modeling of Co-Articulation in Continuous Speech Based on Data Driven Interpolation

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
General sound classification and similarity in MPEG-7

Organised Sound
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Discriminative semi-parametric trajectory model for speech recognition

Computer Speech and Language
Interpolating hidden Markov model and its application to automatic instrument recognition

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

IEEE Transactions on Signal Processing
Structural Segmentation of Musical Audio by Constrained Clustering

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A computationally efficient algorithm is proposed for modeling and representing time-varying musical sounds. The aim is to encode individual sounds and not the statistical properties of several sounds representing a certain class. A given sequence of acoustic feature vectors is modeled by finding such a set of "states" (anchor points in the feature space) that the input data can be efficiently represented by interpolating between them. The proposed interpolating state model is generic and can be used to represent any multidimensional data sequence. In this paper, it is applied to represent musical instrument sounds in a compact and accurate form. Simulation experiments were carried out which show that the proposed method clearly outperforms the conventional vector quantization approach where the acoustic feature data is k-means clustered and the feature vectors are replaced by the corresponding cluster centroids. The computational complexity of the proposed algorithm as a function of the input sequence length T is O(TlogT).