Short Communication: Feature selection for reduced-bandwidth distributed speech recognition

Authors:
Ronan Flynn;Edward Jones
Affiliations:
Department of Electronic Engineering, Athlone Institute of Technology, Ireland;College of Engineering & Informatics, National University of Ireland, Galway, Ireland
Venue:
Speech Communication
Year:
2012

Citing 6
Cited 0

Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition

Speech Communication
Speech Recognition over Digital Channels: Robustness And Standards

Speech Recognition over Digital Channels: Robustness And Standards
Automatic Speech Recognition on Mobile Devices and over Communication Networks (Advances in Pattern Recognition)

Automatic Speech Recognition on Mobile Devices and over Communication Networks (Advances in Pattern Recognition)
Feature selection using singular value decomposition and QR factorization with column pivoting for text-independent speaker identification

Speech Communication
Using Laplacian eigenmaps latent variable model and manifold learning to improve speech recognition accuracy

Speech Communication
The Sensitivity Matrix: Using Advanced Auditory Models in Speech and Audio Processing

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The impact on speech recognition performance in a distributed speech recognition (DSR) environment of two methods used to reduce the dimension of the feature vectors is examined in this paper. The motivation behind reducing the dimension of the feature set is to reduce the bandwidth required to send the feature vectors over a channel from the client front-end to the server back-end in a DSR system. In the first approach, the features are empirically chosen to maximise recognition performance. A data-centric transform-based dimensionality-reduction technique is applied in the second case. Test results for the empirical approach show that individual coefficients have different impacts on the speech recognition performance, and that certain coefficients should always be present in an empirically selected reduced feature set for given training and test conditions. Initial results show that for the empirical method, the number of elements in a feature vector produced by an established DSR front-end can be reduced by 23% with low impact on the recognition performance (less than 8% relative performance drop compared to the full bandwidth case). Using the transform-based approach, for a similar impact on recognition performance, the number of feature vector elements can be reduced by 30%. Furthermore, for best recognition performance, the results indicate that the SNR of the speech signal should be considered using either approach when selecting the feature vector elements that are to be included in a reduced feature set.