Telephone speech recognition via the combination of knowledge sources in a segmental speech model

Authors:
László Tóth;András Kocsor;Gábor Gosztolya
Affiliations:
Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, H-6720 Szeged, Aradi vértanúk tere 1., Hungary; ;
Venue:
Acta Cybernetica
Year:
2004

Citing 7
Cited 2

Assessing the importance of the segmentation probability in segment-based speech recognition

Speech Communication
Comparison of discriminative training criteria and optimization methods for speech recognition

Speech Communication
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Compressed Storage of Sparse Finite-State Transducers

WIA '99 Revised Papers from the 4th International Workshop on Automata Implementation
Application of Kernel-Based Feature Space Transformations and Learning Methods to Phoneme Classification

Applied Intelligence
On the use of support vector machines for phonetic classification

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Kernel-based feature extraction with a speech technology application

IEEE Transactions on Signal Processing

A segment-based interpretation of HMM/ANN hybrids

Computer Speech and Language
Using One-Class Classification Techniques in the Anti-phoneme Problem

IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The currently dominant speech recognition methodology, Hidden Markov Modeling, treats speech as a stochastic random process with very simple mathematical properties. The simplistic assumptions of the model, and especially that of the independence of the observation vectors have been criticized by many in the literature, and alternative solutions have been proposed. One such alternative is segmental modeling, and the OASIS recognizer we have been working on in the recent years belongs to this category. In this paper we go one step further and suggest that we should consider speech recognition as a knowledge source combination problem. We offer a generalized algorithmic framework for this approach and show that both hidden Markov and segmental modeling are a special case of this decoding scheme. In the second part of the paper we describe the current components of the OASIS system and evaluate its performance on a very difficult recognition task, the phonetically balanced sentences of the MTBA Hungarian Telephone Speech Database. Our results show that OASIS outperforms a traditional HMM system in phoneme classification and achieves practically the same recognition scores at the sentence level.