Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Analysis of selective strategies to build a dependency-analyzed corpus
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Unsupervised training and directed manual transcription for LVCSR
Speech Communication
Hi-index | 0.00 |
Speech recognition systems are expensive to train, mostly due to the high cost of annotating training data. We previously proposed an iterative training selection algorithm [1], which sought to improve speech recognition by automatically selecting a subset of the available humanly transcribed training data, thereby improving error rates without incurring additional transcription cost. We suggest one improvement to that "selective sampling" algorithm and show that we are able to reduce the error rate on a particular alphadigit recognition problem from 10.3% to 9.5%. We then extend the iterative training selection algorithm to work with untranscribed speech, guiding selection of speech that is then transcribed. We show, on a particular alphadigit recognition problem, that it is possible to match the baseline error rate while only incurring 25% of the transcription cost.