The ISL RT-07 Speech-to-Text System

Authors:
Matthias Wölfel;Sebastian Stüker;Florian Kraft
Affiliations:
Interactive Systems Laboratories Institut für Theoretische Informatik, Universität Karlsruhe (TH), Karlsruhe, Germany 76131;Interactive Systems Laboratories Institut für Theoretische Informatik, Universität Karlsruhe (TH), Karlsruhe, Germany 76131;Interactive Systems Laboratories Institut für Theoretische Informatik, Universität Karlsruhe (TH), Karlsruhe, Germany 76131
Venue:
Multimodal Technologies for Perception of Humans
Year:
2008

Citing 2
Cited 0

Speaker Normalization Based on Frequency Warping

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A statistical text-to-phone function using ngrams and rules

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes the 2007 meeting speech-to-text system for lecture roomsdeveloped at the Interactive Systems Laboratories (ISL), for the multiple distant microphone condition, which has been evaluated in the RT-07 Rich Transcription Meeting Evaluation sponsored by the US National Institute of Standards and Technologies (NIST). We describe the principal differences between our current system and those submitted in previous years, namely the use of a signal adaptive front-end (realized by warped-twice warped minimum variance distortionless response spectral estimation), improved acoustic (including maximum mutual information estimation) and language models, cross adaptation between systems which differ in the front-end as well as the phoneme set, the use of a discriminative criteria instead of the signal-to-noise ratio for the selection of the channel to be used and the use of decoder based speech segmentation.