Combining speech attribute detection and penalized logistic regression for phoneme recognition

Authors:
Sabato Marco Siniscalchi
Affiliations:
Faculty of Engineering and Architecture, Kore University of Enna, Cittadella Universitaria, Enna, Sicily, Italy
Venue:
Neurocomputing
Year:
2012

Citing 9
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Phone Classification with Segmental Features and a Binary-Pair Partitioned Neural Network Classifier

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Neural Networks - 2005 Special issue: IJCNN 2005
A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition

Speech Communication
Penalized logistic regression with HMM log-likelihood regressors for speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Bidirectional LSTM networks for improved phoneme classification and recognition

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Maximum likelihood estimation for multivariate observations of Markov sources

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.01

Visualization

Abstract

Over the past few years, there has been a resurgence of interest in designing high-accuracy automatic speech recognition (ASR) systems due to the key rule they can play in many real-world applications, such as voice print for biometric identification, language identification, and call-scanning. Improving current state-of-the-art technology is therefore vital for the success of those aforementioned applications, yet this is not simple with the standard technology based on hidden Markov models (HMMs) trained on short-term spectral features. This paper offers an innovative prospective on how two novel prominent approaches to ASR, namely speech attribute detection and discriminative training, can be combined into a unified framework with beneficial effects on the overall speech recognition performance. This goal is achieved by embedding phonetic feature detection into a penalized logistic regression machine (PLRM). The proposed approach is evaluated on both isolated and continuous phoneme recognition tasks. Experimental evidence indicate that the proposed framework is able to achieve state-of-the-art performance in the isolated speech recognition task and to outperform current technology in the continuous speech recognition task.