Towards capturing fine phonetic variation in speech using articulatory features

Authors:
Odette Scharenborg;Vincent Wan;Roger K. Moore
Affiliations:
Speech and Hearing Research Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK;Speech and Hearing Research Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK;Speech and Hearing Research Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK
Venue:
Speech Communication
Year:
2007

Citing 8
Cited 3

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Modelling auditory processing and organisation

Modelling auditory processing and organisation
Modeling pronunciation variation for ASR: a survey of the literature

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Probability Estimates for Multi-class Classification by Pairwise Coupling

The Journal of Machine Learning Research
Speech recognition based on phonetic features and acoustic landmarks

Speech recognition based on phonetic features and acoustic landmarks
Visual Speech Recognition with Loosely Synchronized Feature Streams

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
Automatic analysis of Mandarin accented English using phonological features

Speech Communication
Exploiting deep neural networks for detection-based speech recognition

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ultimate goal of our research is to develop a computational model of human speech recognition that is able to capture the effects of fine-grained acoustic variation on speech recognition behaviour. As part of this work we are investigating automatic feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. In the experiments reported here, we analysed the classification results from support vector machines (SVMs) and multilayer perceptrons (MLPs). MLPs have been widely and successfully used for the task of multi-value articulatory feature classification, while (to the best of our knowledge) SVMs have not. This paper compares the performance of the two classifiers and analyses the results in order to better understand the articulatory representations. It was found that the SVMs outperformed the MLPs for five out of the seven articulatory feature classes we investigated while using only 8.8-44.2% of the training material used for training the MLPs. The structure in the misclassifications of the SVMs and MLPs suggested that there might be a mismatch between the characteristics of the classification systems and the characteristics of the description of the AF values themselves. The analyses showed that some of the misclassified features are inherently confusable given the acoustic space. We concluded that in order to come to a feature set that can be used for a reliable and accurate automatic description of the speech signal; it could be beneficial to move away from quantised representations.