Slovenian spontaneous speech recognition and acoustic modeling of filled pauses and onomatopoeas

Authors:
Andrej Žgank;Tomaž Rotovnik;Mirjam Sepesy Maučec
Affiliations:
Laboratory for Digital Signal Processing, University of Maribor, Maribor, Slovenia;Laboratory for Digital Signal Processing, University of Maribor, Maribor, Slovenia;Laboratory for Digital Signal Processing, University of Maribor, Maribor, Slovenia
Venue:
WSEAS Transactions on Signal Processing
Year:
2008

Citing 6
Cited 1

Field trial evaluations of two different information inquiry systems

Speech Communication - Special issue on interactive voice technology for telecommunication applications (IVITA '96)
Automation of locality recognition in ADAS plus

Speech Communication - Special issue on interactive voice technology for telecommunication applications
Transcriber: Development and use of a tool for assisting speech corpora production

Speech Communication - Special issue on speech annotation and corpus tools
Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition

Journal of VLSI Signal Processing Systems
Large vocabulary continuous speech recognition of an inflected language using stems and endings

Speech Communication
Word and triphone based approaches in continuous speech recognition for Tamil language

WSEAS Transactions on Signal Processing

Compilation, transcription and usage of a reference speech corpus: the case of the Slovene corpus GOS

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is focused on acoustic modeling for spontaneous speech recognition. This topic is still a very challenging task for speech technology research community. The attributes of spontaneous speech can heavily degrade speech recognizer's accuracy and performance. Filled pauses and onomatopoeias present one of such important attributes of spontaneous speech, which can give considerably worse accuracy. Although filled pauses don't carry any semantic information, they are still very important from the modeling perspective. A novel acoustic modeling approach is proposed in this paper, where the filled pauses are modeled using the phonetic broad classes, which corresponds with their acoustic-phonetic properties. The phonetic broad classes are language dependent, and can be defined by an expert or in a data-driven way. The new filled pauses modeling approach is compared with three other implicit filled pauses modeling methods. All experiments were carried out using a context-dependent Hidden Markov Models based speech recognition system. For training and evaluation, the Slovenian BNSI Broadcast News speech and text database was applied. The database contains manually transcribed recordings of TV news shows. The evaluation of the proposed acoustic modeling approach was done on a set of spontaneous speech. The overall best filled pauses acoustic modeling approach improved the speech recognizer's word accuracy for 5.70% relatively in comparison to the baseline system, without influencing the recognition time.