Gaussian Mixture Modeling of Short-Time Fourier Transform Features for Audio Fingerprinting

  • Authors:
  • A. Ramalingam;S. Krishnan

  • Affiliations:
  • Dept. of Electr. & Comput. Eng., Ryerson Inst., Toronto, Ont.;-

  • Venue:
  • IEEE Transactions on Information Forensics and Security
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In audio fingerprinting, an audio clip must be recognized by matching an extracted fingerprint to a database of previously computed fingerprints. The fingerprints should reduce the dimensionality of the input significantly, provide discrimination among different audio clips, and, at the same time, be invariant to distorted versions of the same audio clip. In this paper, we design fingerprints addressing the above issues by modeling an audio clip by Gaussian mixture models (GMM). We evaluate the performance of many easy-to-compute short-time Fourier transform features, such as Shannon entropy, Renyi entropy, spectral centroid, spectral bandwidth, spectral flatness measure, spectral crest factor, and Mel-frequency cepstral coefficients in modeling audio clips using GMM for fingerprinting. We test the robustness of the fingerprints under a large number of distortions. To make the system robust, we use some of the distorted versions of the audio for training. However, we show that the audio fingerprints modeled using GMM are not only robust to the distortions used in training but also to distortions not used in training. Among the features tested, spectral centroid performs best with an identification rate of 99.2% at a false positive rate of 10-4. All of the features give an identification rate of more than 90% at a false positive rate of 10-3