A Comparison of Spectro-Temporal Representations of Audio Signals

  • Authors:
  • P. W. J. van Hengel;J. D. Krijnders

  • Affiliations:
  • Cognitive Syst. Group, INCAS (Innovation Center for Adv. Sensors & Sensor Syst.), Assen, Netherlands;Cognitive Syst. Group, INCAS (Innovation Center for Adv. Sensors & Sensor Syst.), Assen, Netherlands

  • Venue:
  • IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article compares methods for the conversion of timeseries into a spectro-temporal representation. These methods are designed based on a resemblance with the auditory processing of sound in the mammalian inner ear, or on mathematical principles related to, for example, Fourier analysis. This study provides a comparison between several of these methods. Two tests were devised for this comparison: one based on susceptibility to noise and one on the expression of spectro-temporal detail. These two aspects were considered of importance for real world applications. While some methods produced good results on one of the two tests, others produced good results on both. Overall the transmission line model using an impedance function suggested by Zweig (“Finding the impedance of the organ of Corti,” J. Acoust. Soc. Amer., vol. 89, no. 3, pp. 1229-1254, 1991) provided the best results, though not significantly. Also a larger computational load may hinder application in some domains. The gammatone filterbank and straightforward spectrogram provide good alternatives with less computational load. The introduction of nonlinearity was shown to deteriorate performance on both tests, in both the filterbank and in the transmission line model.