A Neural Multi-expert Classification System for MPEG Audio Segmentation
ICAPR '01 Proceedings of the Second International Conference on Advances in Pattern Recognition
ITS'10 Proceedings of the 10th international conference on Intelligent Tutoring Systems - Volume Part I
Hi-index | 0.00 |
Speech recognition can be used to create searchable transcripts for audio indexing in digital video libraries. Large amounts of hand-transcribed speech training data are required to build or improve acoustic models of highly accurate speech recognition systems using current technologies. We present a technique to use television broadcasts with closed-captions as a source for large amounts of automatically extracted and accurately transcribed speech for improving acoustic models. The errorful closed captioned text is aligned with the also errorful speech recognition output and matching segments are used with each corresponding audio segment as acoustic training data to improve the speech recognition system. Our technique automatically extracted 131.4 hours of transcribed speech and improved the word error rate of our currently best speech recognition system (Sphinx-III) from 32.82% to 31.19%. A speech recognizer trained exclusively on 70.7 hours of this automatically transcribe! d speech produced a word error rate of 32.7%.