Generating training data for medical dictations
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Semantic and phonetic automatic reconstruction of medical dictations
Computer Speech and Language
Hi-index | 0.00 |
Automatic phonetic reconstruction of medical dictations from non-literal and automatically recognized speech transcripts leads to closer-to-literal transcripts for training language models of speech recognizers. In this paper, we introduce an extended alignment method assessing multiple levels of text segmentation and show how open issues like wrong segmentation in the recognized transcript can be resolved. Furthermore, we compare a rule-based text reconstruction approach with an automatic classifier, using the multi-level alignment and a stochastic phonetic similarity measure as features. Experiments show better performance for the rule-based system in terms of Recall and Precision, but superiority of the automatic classifier in terms of language model perplexity. The overall increase in precision compared to the simple system in [1] is between 0.7% and 4.7% absolute without loss in recall.