Field trial evaluations of two different information inquiry systems
Speech Communication - Special issue on interactive voice technology for telecommunication applications (IVITA '96)
Automation of locality recognition in ADAS plus
Speech Communication - Special issue on interactive voice technology for telecommunication applications
Transcriber: Development and use of a tool for assisting speech corpora production
Speech Communication - Special issue on speech annotation and corpus tools
Journal of VLSI Signal Processing Systems
Word and triphone based approaches in continuous speech recognition for Tamil language
WSEAS Transactions on Signal Processing
Language Resources and Evaluation
Hi-index | 0.00 |
This paper is focused on acoustic modeling for spontaneous speech recognition. This topic is still a very challenging task for speech technology research community. The attributes of spontaneous speech can heavily degrade speech recognizer's accuracy and performance. Filled pauses and onomatopoeias present one of such important attributes of spontaneous speech, which can give considerably worse accuracy. Although filled pauses don't carry any semantic information, they are still very important from the modeling perspective. A novel acoustic modeling approach is proposed in this paper, where the filled pauses are modeled using the phonetic broad classes, which corresponds with their acoustic-phonetic properties. The phonetic broad classes are language dependent, and can be defined by an expert or in a data-driven way. The new filled pauses modeling approach is compared with three other implicit filled pauses modeling methods. All experiments were carried out using a context-dependent Hidden Markov Models based speech recognition system. For training and evaluation, the Slovenian BNSI Broadcast News speech and text database was applied. The database contains manually transcribed recordings of TV news shows. The evaluation of the proposed acoustic modeling approach was done on a set of spontaneous speech. The overall best filled pauses acoustic modeling approach improved the speech recognizer's word accuracy for 5.70% relatively in comparison to the baseline system, without influencing the recognition time.