Improved recognition of spontaneous Hungarian speech: morphological and acoustic modeling techniques for a less resourced task

Authors:
Péter Mihajlik;Zoltán Tüske;Balázs Tarján;Bottyán Németh;Tibor Fegyó
Affiliations:
Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary and THINKTech Research Center Nonprofit LLC, Vác, Hungary;Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary;Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary;Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary;Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary and AITIA International, Inc, Budapest, Hungary
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 10
Cited 1

Korean large vocabulary continuous speech recognition with morpheme-based recognition units

Speech Communication
Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Unlimited vocabulary speech recognition for agglutinative languages

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Morph-based speech recognition and modeling of out-of-vocabulary words across languages

ACM Transactions on Speech and Language Processing (TSLP)
Corrective models for speech recognition of inflected languages

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Hunmorph: open source word analysis

Software '05 Proceedings of the Workshop on Software
Automatic word decompounding for ASR in a morphologically rich language: application to Amharic

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing morphologically rich languages
Towards automatic transcription of large spoken archives in agglutinating languages - Hungarian ASR for the MALACH project

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Turkish Broadcast News Transcription and Retrieval

IEEE Transactions on Audio, Speech, and Language Processing
Benefits of resource-based stemming in hungarian information retrieval

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Speech recognition experiments with audiobooks

Acta Cybernetica

Quantified Score

Hi-index	0.00

Visualization

Abstract

Various morphological and acoustic modeling techniques are evaluated on a less resourced, spontaneous Hungarian large-vocabulary continuous speech recognition (LVCSR) task. Among morphologically rich languages, Hungarian is known for its agglutinative, inflective nature that increases the data sparseness caused by a relatively small training database. Although Hungarian spelling is considered as simple phonological, a large part of the corpus is covered by words pronounced in multiple, phonemically different ways. Data-driven and language specific knowledge supported vocabulary decomposition methods are investigated in combination with phoneme- and grapheme-based acoustic modeling techniques on the given task. Word baseline and morph-based advanced baseline results are significantly outperformed by using both statistical and grammatical vocabulary decomposition methods. Although the discussed morph-based techniques recognize a significant amount of out of vocabulary words, the improvements are due not to this fact but to the reduction of insertion errors. Applying grapheme-based acoustic models instead of phoneme-based models causes no severe recognition performance deteriorations. Moreover, a fully data-driven acoustic modeling technique along with a statistical morphological modeling approach provides the best performance on the most difficult test set. The overall best speech recognition performance is obtained by using a novel word to morph decomposition technique that combines grammatical and unsupervised statistical segmentation algorithms. The improvement achieved by the proposed technique is stable across acoustic modeling approaches and larger with speaker adaptation.