Semantic and phonetic automatic reconstruction of medical dictations

Authors:
Stefan Petrik;Christina Drexel;Leo Fessler;Jeremy Jancsary;Alexandra Klein;Gernot Kubin;Johannes Matiasek;Franz Pernkopf;Harald Trost
Affiliations:
Signal Processing & Speech Communication Laboratory, Graz University of Technology, Graz, Austria;Nuance Communications Austria, Vienna, Austria;Nuance Communications Austria, Vienna, Austria;Austrian Research Institute for Artificial Intelligence, Vienna, Austria;Austrian Research Institute for Artificial Intelligence, Vienna, Austria;Signal Processing & Speech Communication Laboratory, Graz University of Technology, Graz, Austria;Austrian Research Institute for Artificial Intelligence, Vienna, Austria;Signal Processing & Speech Communication Laboratory, Graz University of Technology, Graz, Austria;Institute of Medical Cybernetics and Artificial Intelligence of the Center for Brain Research, Medical University Vienna, Austria
Venue:
Computer Speech and Language
Year:
2011

Citing 8
Cited 0

Phonetic string matching: lessons from information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A dynamic Bayesian framework to model context and memory in edit distance learning: an application to pronunciation classification

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Measures of semantic similarity and relatedness in the biomedical domain

Journal of Biomedical Informatics
A methodology of error detection: improving speech recognition in radiology

A methodology of error detection: improving speech recognition in radiology
A new method for OOV detection using hybrid word/fragment system

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Language model adaptation for medical dictations by automatic phonetics-driven transcript reconstruction

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic speech recognition (ASR) has become a valuable tool in large document production environments like medical dictation. While manual post-processing is still needed for correcting speech recognition errors and for creating documents which adhere to various stylistic and formatting conventions, a large part of the document production process is carried out by the ASR system. For improving the quality of the system output, knowledge about the multi-layered relationship between the dictated texts and the final documents is required. Thus, typical speech-recognition errors can be avoided, and proper style and formatting can be anticipated in the ASR part of the document production process. Yet - while vast amounts of recognition results and manually edited final reports are constantly being produced - the error-free literal transcripts of the actually dictated texts are a scarce and costly resource because they have to be created by manually transcribing the audio files. To obtain large corpora of literal transcripts for medical dictation, we propose a method for automatically reconstructing them from draft speech-recognition transcripts plus the corresponding final medical reports. The main innovative aspect of our method is the combination of two independent knowledge sources: phonetic information for the identification of speech-recognition errors and semantic information for detecting post-editing concerning format and style. Speech recognition results and final reports are first aligned, then properly matched based on semantic and phonetic similarity, and finally categorised and selectively combined into a reconstruction hypothesis. This method can be used for various applications in language technology, e.g., adaptation for ASR, document production, or generally for the development of parallel text corpora of non-literal text resources. In an experimental evaluation, which also includes an assessment of the quality of the reconstructed transcripts compared to manual transcriptions, the described method results in a relative word error rate reduction of 7.74% after retraining the standard language model with reconstructed transcripts.