The use of phase in complex spectrum subtraction for robust speech recognition

Authors:
Tristan Kleinschmidt;Sridha Sridharan;Michael Mason
Affiliations:
-;-;-
Venue:
Computer Speech and Language
Year:
2011

Citing 7
Cited 2

Speech recognition in noisy environments: a survey

Speech Communication
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Short-time phase spectrum in speech processing: A review and some experimental results

Digital Signal Processing
High improvement of speaker identification and verification by combining MFCC and phase information

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Likelihood-maximizing-based multiband spectral subtraction for robust speech recognition

EURASIP Journal on Advances in Signal Processing
Phase-Based Dual-Microphone Speech Enhancement Using A Prior Speech Model

IEEE Transactions on Audio, Speech, and Language Processing
Phase-based dual-microphone robust speech enhancement

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Real and imaginary modulation spectral subtraction for speech enhancement

Speech Communication
Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through complex spectrum subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters; and (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.