Tree-based state tying for high accuracy acoustic modelling

Authors:
S. J. Young;J. J. Odell;P. C. Woodland
Affiliations:
Cambridge University, Cambridge, England;Cambridge University, Cambridge, England;Cambridge University, Cambridge, England
Venue:
HLT '94 Proceedings of the workshop on Human Language Technology
Year:
1994

Citing 3
Cited 52

Context dependent modeling of phones in continuous speech using decision trees

HLT '91 Proceedings of the workshop on Speech and Natural Language
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
A one pass decoder design for large vocabulary recognition

HLT '94 Proceedings of the workshop on Human Language Technology

Retrieving spoken documents by combining multiple index sources

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Open-vocabulary speech indexing for voice and video mail retrieval

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Using tone information in Cantonese continuous speech recognition

ACM Transactions on Asian Language Information Processing (TALIP)
Diphone subspace mixture trajectory models for HMM Complementation

Speech Communication
Korean large vocabulary continuous speech recognition with morpheme-based recognition units

Speech Communication
Clustering of triphones using phoneme similarity estimation for the definition of a multilingual set of triphones

Speech Communication
Decision Tree Based Clustering

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Taiscéalaí: Information Retrieval from an Archive of Spoken Radio News

ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
German and Czech Speech Synthesis Using HMM-Based Speech Segment Database

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Building a New Czech Text-to-Speech System Using Triphone-Based Speech Units

TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
An Automatic Speech Translation System on PDAs for Travel Conversation

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Covariance-Tied Clustering Method In Speaker Identification

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Tree-Based State Clustering Using Self-Organizing Principles for Large Vocabulary On-Line Handwriting Recognition

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
1993 benchmark tests for the ARPA spoken language program

HLT '94 Proceedings of the workshop on Human Language Technology
A one pass decoder design for large vocabulary recognition

HLT '94 Proceedings of the workshop on Human Language Technology
Multi-speaker articulatory trajectory formation based on speaker-independent articulatory HMMs

Speech Communication
Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition

Speech Communication
Language-dependent state clustering for multilingual acoustic modelling

Speech Communication
Acoustic variability and automatic recognition of children's speech

Speech Communication
Generation of Phonetic Units for Mixed-Language Speech Recognition Based on Acoustic and Contextual Analysis

IEEE Transactions on Computers
Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis

Speech Communication
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Mandarin short message dictation on Symbian series 60 mobile phones

Mobility '07 Proceedings of the 4th international conference on mobile technology, applications, and systems and the 1st international symposium on Computer human interaction in mobile technology
Specifics of Hidden Markov Model Modifications for Large Vocabulary Continuous Speech Recognition

Informatica
Limited-Vocabulary Estonian Continuous Speech Recognition System using Hidden Markov Models

Informatica
Acoustic Modelling for Croatian Speech Recognition and Synthesis

Informatica
The ASRS_RL --- A Research Platform for Spoken Language Recognition and Understanding Experiments

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Development of a Speech Recognizer with the Tecnovoz Database

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Improving robustness of MLLR adaptation with speaker-clustered regression class trees

Computer Speech and Language
Context-dependent alignment models for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis

Speech Communication
Optimizing multiple pronunciation dictionary based on a confusability measure for non-native speech recognition

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Automatic speech recognition for under-resourced languages: application to Vietnamese language

IEEE Transactions on Audio, Speech, and Language Processing
A hybrid approach to adapting acoustic and pronunciation models for non-native speech recognition

Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
Thousands of voices for HMM-based speech synthesis: analysis and application of TTS systems built on various ASR corpora

IEEE Transactions on Audio, Speech, and Language Processing
Eigenvalues Driven Gaussian Selection in continuous speech recognition using HMMs with full covariance matrices

Applied Intelligence
Decision trees for lexical smoothing in statistical machine translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
The subspace Gaussian mixture model-A structured model for speech recognition

Computer Speech and Language
The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate

Speech Communication
Rule-based triphone mapping for acoustic modeling in automatic speech recognition

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I)

International Journal of Speech Technology
Statistical modelling in continuous speech recognition (CSR)

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Reliable unseen model prediction for vocabulary-independent speech recognition

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping

Speech Communication
Multi-accent acoustic modelling of South African English

Speech Communication
Video mail retrieval using voice: an overview of the stage 2 system

MIRO'95 Proceedings of the Final conference on Multimedia Information Retrieval
A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models

Applied Intelligence
ICMI'12 grand challenge: haptic voice recognition

Proceedings of the 14th ACM international conference on Multimodal interaction
Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

Computer Speech and Language
Eigentrigraphemes for under-resourced languages

Speech Communication
Predicting utterance pitch targets in Yorùbá for tone realisation in speech synthesis

Speech Communication
Large vocabulary Russian speech recognition using syntactico-statistical language modeling

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance between model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many such contexts will never occur in the training data. This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree. This tree-based clustering is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones. State-tying is also compared with traditional model-based tying and shown to be clearly superior. Experimental results are presented for both the Resource Management and Wall Street Journal tasks.