Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach

Authors:
Seyed Reza Shahamiri;Siti Salwah Binti Salim
Affiliations:
-;-
Venue:
Neurocomputing
Year:
2014

Citing 25
Cited 1

A time-delay neural network architecture for isolated word recognition

Neural Networks
Speech recognition in noisy environments: a survey

Speech Communication
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Artificial Neural Networks

Artificial Neural Networks
Signal processing for in-car communication systems

Signal Processing
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
Effect of retroflex sounds on the recognition of Hindi voiced and unvoiced stops

AI & Society
Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks

Neural Computing and Applications
Speech recognition with artificial neural networks

Digital Signal Processing
Multiple-view multiple-learner active learning

Pattern Recognition
On the recognition of cochlear implant-like spectrally reduced speech with MFCC and HMM-based ASR

IEEE Transactions on Audio, Speech, and Language Processing
Sparse imputation for large vocabulary noise robust ASR

Computer Speech and Language
Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise

Speech Communication
Reconstruction of missing features by means of multivariate Laplace distribution (MLD) for noise robust speech recognition

Expert Systems with Applications: An International Journal
Variational noise model composition through model perturbation for robust speech recognition with time-varying background noise

Speech Communication
The use of phase in complex spectrum subtraction for robust speech recognition

Computer Speech and Language
An automated framework for software test oracle

Information and Software Technology
A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech

Speech Communication
Multiple-View Multiple-Learner Semi-Supervised Learning

Neural Processing Letters
Artificial neural networks as multi-networks automated test oracle

Automated Software Engineering
Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System

IEEE Transactions on Audio, Speech, and Language Processing
Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Nonlinear enhancement of noisy speech, using continuous attractor dynamics formed in recurrent neural networks

Neurocomputing
Blind source extraction for robust speech recognition in multisource noisy environments

Computer Speech and Language

Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach

Advanced Engineering Informatics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Automatic Speech Recognition (ASR) is a technology for identifying uttered word(s) represented as an acoustic signal. However, one of the important aspects of a noise-robust ASR system is its ability to recognise speech accurately in noisy conditions. This paper studies the applications of Multi-Nets Artificial Neural Networks (M-N ANNs), a realisation of multiple-views multiple-learners approach, as Multi-Networks Speech Recognisers (M-NSRs) in providing a real-time, frequency-based noise-robust ASR model. M-NSRs define speech features associated with each word as a different view and apply a standalone ANN as one of the learners to approximate that view; meanwhile, multiple-views single-learner (MVSL) ANN-based speech recognisers employ only one ANN to memorise the features of the entire vocabulary. In this research, an M-NSR was provided and evaluated using unforeseen test data that were affected by white, brown, and pink noises; more specifically, 27 experiments were conducted on noisy speech to measure the accuracy and recognition rate of the proposed model. Furthermore, the results of the M-NSR were compared in detail with an MVSL ANN-based ASR system. The M-NSR recorded an improved average recognition rate by up to 20.14% when it was given the test data infected with noise in our experiments. It is shown that the M-NSR with higher degree of generalisability can handle frequency-based noise because it has higher recognition rate than the previous model under noisy conditions.