Mandarin pronunciation modeling based on CASS corpus

Authors:
Zheng Fang;Song Zhanjiang;Pascale Fung;William Byrne
Affiliations:
Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology, Tsinghua University, Beijing 100084, P.R. China;Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology, Tsinghua University, Beijing 100084, P.R. China;Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology Hong Kong, P.R. China;Center for Language and Speech Processing, The Johns Hopkins University
Venue:
Journal of Computer Science and Technology
Year:
2002

Citing 11
Cited 2

Statistical methods for speech recognition

Statistical methods for speech recognition
Pronunciation variants across system configuration, language and speaking style

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
In search of better pronunciation models for speech recognition

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Effects of speaking rate and word frequency on pronunciations in conversational speech

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Speaking in shorthand — a syllable-centric perspective for understanding pronunciation variation

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Maximum likelihood modelling of pronunciation variation

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Stochastic pronunciation modelling from hand-labelled phonetic corpora

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Modeling pronunciation variation for ASR: a survey of the literature

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Deleted interpolation and density sharing for continuous hidden Markov models

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A syllable-synchronous network search algorithm for word decoding in Chinese speech recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02

State-dependent phoneme-based model merging for dialectal Chinese speech recognition

Speech Communication
State-dependent phoneme-based model merging for dialectal chinese speech recognition

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The pronunciation variability is an important issue that must be faced with when developing practical automatic spontaneous speech recognition systems. In this paper, the factors that may affect the recognition performance are analyzed, including those specific to the Chinese language. By studying the INITIAL/FINAL (IF) characteristics of Chinese language and developing the Bayesian equation, the concepts of generalized INITIAL/FINAL (GIF) and generalized syllable (GS), the GIF modeling and the IF-GIF modeling, as well as the context-dependent pronunciation weighting, are proposed based on a well phonetically transcribed seed database. By using these methods, the Chinese syllable error rate (SER) is reduced by 6.3% and 4.2% compared with the GIF modeling and IF modeling respectively when the language model, such as syllable or word N-gram, is not used. The effectiveness of these methods is also proved when more data without the phonetic transcription are used to refine the acoustic model using the proposed iterative forced-alignment based transcribing (IFABT) method, achieving a 5.7% SER reduction.