Korean large vocabulary continuous speech recognition with morpheme-based recognition units

Authors:
Oh-Wook Kwon;Jun Park
Affiliations:
Brain Science Research Center, KAIST, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, South Korea;Spoken Language Processing Team, ETRI, 161 Gajeong-dong, Yuseong-gu, Daejeon 305-350, South Korea
Venue:
Speech Communication
Year:
2003

Citing 5
Cited 9

Japanese large-vocabulary continuous-speech recognition using a newspaper corpus and broadcast news

Speech Communication
An efficient search space representation for large vocabulary continuous speech recognition

Speech Communication
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology
Developments in continuous speech dictation using the 1995 ARPA NAB news task

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

A unified language model for large vocabulary continuous speech recognition of Turkish

Signal Processing - Fractional calculus applications in signals and systems
Large vocabulary continuous speech recognition of an inflected language using stems and endings

Speech Communication
Morph-based speech recognition and modeling of out-of-vocabulary words across languages

ACM Transactions on Speech and Language Processing (TSLP)
Morpheme-Based Modeling of Pronunciation Variation for Large Vocabulary Continuous Speech Recognition in Korean

IEICE - Transactions on Information and Systems
Morpheme-Based Automatic Speech Recognition of Basque

IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
Towards automatic transcription of large spoken archives in agglutinating languages - Hungarian ASR for the MALACH project

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Comparison of different modeling units for language model adaptation for inflected languages

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Improved recognition of spontaneous Hungarian speech: morphological and acoustic modeling techniques for a less resourced task

IEEE Transactions on Audio, Speech, and Language Processing
A-STAR: Toward translating Asian spoken languages

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Korean writing, a space is placed between two adjacent word-phrases, each of which generally corresponds to two or three words in English in a semantic sense. If the word-phrase is used as a recognition unit for Korean large vocabulary continuous speech recognition (LVCSR), the out-of-vocabulary (OOV) rate becomes very large. If a morpheme or a syllable is used instead, a severe inter-morpheme coarticulation problem arises due to short morphemes. We propose to use a merged morpheme as the recognition unit and pronunciation-dependent entries in a language model (LM) so that we can reduce such difficulties and incorporate the between-word phonology rule into the decoding algorithm of a Korean LVCSR system. Starting from the original morpheme units defined in the Korean morphology, we merge pairs of short and frequent morphemes into larger units by using a rule-based method and a statistical method. We define the merged morpheme unit as word and use it as the recognition unit. The performance of the system was evaluated in two business-related tasks: a read speech recognition task and a broadcast news transcription task. The OOV rate was reduced to a level comparable to that of American English in both tasks. In the read speech recognition task, with a 32k vocabulary and a word-based trigram LM computed from a newspaper text corpus, the word error rate (WER) of the baseline system was reduced from 25.0% to 20.0% by cross-word modeling and pronunciation-dependent language modeling, and finally to 15.5% by increasing speech database and text corpora. For the broadcast news transcription task, we showed that the statistical method relatively reduced the WER of the baseline system without morpheme merging by 3.4% and both of the proposed methods yielded similar performance. Applying all the proposed techniques, we achieved 17.6% WER for clean speech and 27.7% for noisy speech.