The LIA speech recognition system: from 10xRT to 1xRT

Authors:
G. Linarès;P. Nocera;D. Massonié;D. Matrouf
Affiliations:
Laboratoire Informatique d'Avignon, LIA, Avignon, France;Laboratoire Informatique d'Avignon, LIA, Avignon, France;Laboratoire Informatique d'Avignon, LIA, Avignon, France;Laboratoire Informatique d'Avignon, LIA, Avignon, France
Venue:
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Year:
2007

Citing 1
Cited 1

Phoneme Lattice Based A* Search Algorithm for Speech Recognition

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue

Integrating imperfect transcripts into speech recognition systems for building high-quality corpora

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The LIA developed a speech recognition toolkit providing most of the components required by speech-to-text systems. This toolbox allowed to build a Broadcast News (BN) transcription system was involved in the ESTER evaluation campaign ([1]), on unconstrained transcription and real-time transcription tasks. In this paper, we describe the techniques we used to reach the real-time, starting from our baseline 10xRT system. We focus on some aspects of the A* search algorithm which are critical for both efficiency and accuracy. Then, we evaluate the impact of the different system components (lexicon, language models and acoustic models) to the trade-off between efficiency and accuracy. Experiments are carried out in framework of the ESTER evaluation campaign. Our results show that the real time system reaches performance on about 5.6% absolute WER whorses than the standard 10xRT system, with an absolute WER (Word Error Rate) of about 26.8%.