Alpha-nets: a recurrent “neural” network architecture with a hidden Markov model interpretation
Speech Communication - Neurospeech
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Training products of experts by minimizing contrastive divergence
Neural Computation
Neural Networks - 2005 Special issue: IJCNN 2005
A fast learning algorithm for deep belief nets
Neural Computation
Temporal patterns (TRAPs) in ASR of noisy speech
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Articulatory feature recognition using dynamic Bayesian networks
Computer Speech and Language
Towards capturing fine phonetic variation in speech using articulatory features
Speech Communication
A unified architecture for natural language processing: deep neural networks with multitask learning
Proceedings of the 25th international conference on Machine learning
International Journal of Approximate Reasoning
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning
Penalized logistic regression with HMM log-likelihood regressors for speech recognition
IEEE Transactions on Audio, Speech, and Language Processing
Learning to detect roads in high-resolution aerial images
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
A Maximum Likelihood Approach to Continuous Speech Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate Test Risk Bound Minimization Through Soft Margin Estimation
IEEE Transactions on Audio, Speech, and Language Processing
Approximation capability in C(R¯n) by multilayer feedforward networks and related problems
IEEE Transactions on Neural Networks
Calibration of Confidence Measures in Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
Acoustic Modeling Using Deep Belief Networks
IEEE Transactions on Audio, Speech, and Language Processing
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
Automatic Speech Recognition Based on Non-Uniform Error Criteria
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Phonetic feature extraction for context-sensitive glottal source processing
Speech Communication
Hi-index | 0.01 |
In recent years deep neural networks (DNNs) - multilayer perceptrons (MLPs) with many hidden layers - have been successfully applied to several speech tasks, i.e., phoneme recognition, out of vocabulary word detection, confidence measure, etc. In this paper, we show that DNNs can be used to boost the classification accuracy of basic speech units, such as phonetic attributes (phonological features) and phonemes. This boosting leads to higher flexibility and has the potential to integrate both top-down and bottom-up knowledge into the Automatic Speech Attribute Transcription (ASAT) framework. ASAT is a new family of lattice-based speech recognition systems grounded on accurate detection of speech attributes. In this paper we compare DNNs and shallow MLPs within the ASAT framework to classify phonetic attributes and phonemes. Several DNN architectures ranging from five to seven hidden layers and up to 2048 hidden units per hidden layer will be presented and evaluated. Experimental evidence on the speaker-independent Wall Street Journal corpus clearly demonstrates that DNNs can achieve significant improvements over the shallow MLPs with a single hidden layer, producing greater than 90% frame-level attribute estimation accuracies for all 21 phonetic features tested. Similar improvement is also observed on the phoneme classification task with excellent frame-level accuracy of 86.6% by using DNNs. This improved phoneme prediction accuracy, when integrated into a standard large vocabulary continuous speech recognition (LVCSR) system through a word lattice rescoring framework, results in improved word recognition accuracy, which is better than previously reported word lattice rescoring results.