Speech enhancement using Gaussian scale mixture models
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
This paper deals with the decomposition of music signals into pitched sound objects made of harmonic sinusoidal partials for very low bit-rate coding purposes. After a brief review of existing methods, we recast this problem in the Bayesian framework. We propose a family of probabilistic signal models combining learned object priors and various perceptually motivated distortion measures. We design efficient algorithms to infer object parameters and build a coder based on the interpolation of frequency and amplitude parameters. Listening tests suggest that the loudness-based distortion measure outperforms other distortion measures and that our coder results in a better sound quality than baseline transform and parametric coders at 8 and 2 kbit/s. This work constitutes a new step towards a fully object-based coding system, which would represent audio signals as collections of meaningful note-like sound objects