Time and frequency filtering of filter-bank energies for robust HMM speech recognition
Speech Communication - Special issue on noise robust ASR
Hi-index | 0.00 |
Channel-distortion in real-environment is at issue in music information retrieval system by content-based audio identification technique. As a matter of fact, audio signal is commonly distorted by channel and background noise in case of that it is recorded under real-situation. Recently, Philips published a robust and efficient audio fingerprinting system for audio identification. To extract a robust and efficient audio fingerprint, Philips applied the first derivative (differential) to the frequency-time sequence of perceptual filter-bank energies. In practice, however, it is not sufficient to remove the undesired perturbations. This paper introduces an extension method of the audio fingerprint extraction scheme of Philips that is more immune to channel-distortion. The channel-normalization techniques for temporal filtering are used to lessen the channel effects of real-environment.