Semantic and phonetic automatic reconstruction of medical dictations

  • Authors:
  • Stefan Petrik;Christina Drexel;Leo Fessler;Jeremy Jancsary;Alexandra Klein;Gernot Kubin;Johannes Matiasek;Franz Pernkopf;Harald Trost

  • Affiliations:
  • Signal Processing & Speech Communication Laboratory, Graz University of Technology, Graz, Austria;Nuance Communications Austria, Vienna, Austria;Nuance Communications Austria, Vienna, Austria;Austrian Research Institute for Artificial Intelligence, Vienna, Austria;Austrian Research Institute for Artificial Intelligence, Vienna, Austria;Signal Processing & Speech Communication Laboratory, Graz University of Technology, Graz, Austria;Austrian Research Institute for Artificial Intelligence, Vienna, Austria;Signal Processing & Speech Communication Laboratory, Graz University of Technology, Graz, Austria;Institute of Medical Cybernetics and Artificial Intelligence of the Center for Brain Research, Medical University Vienna, Austria

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic speech recognition (ASR) has become a valuable tool in large document production environments like medical dictation. While manual post-processing is still needed for correcting speech recognition errors and for creating documents which adhere to various stylistic and formatting conventions, a large part of the document production process is carried out by the ASR system. For improving the quality of the system output, knowledge about the multi-layered relationship between the dictated texts and the final documents is required. Thus, typical speech-recognition errors can be avoided, and proper style and formatting can be anticipated in the ASR part of the document production process. Yet - while vast amounts of recognition results and manually edited final reports are constantly being produced - the error-free literal transcripts of the actually dictated texts are a scarce and costly resource because they have to be created by manually transcribing the audio files. To obtain large corpora of literal transcripts for medical dictation, we propose a method for automatically reconstructing them from draft speech-recognition transcripts plus the corresponding final medical reports. The main innovative aspect of our method is the combination of two independent knowledge sources: phonetic information for the identification of speech-recognition errors and semantic information for detecting post-editing concerning format and style. Speech recognition results and final reports are first aligned, then properly matched based on semantic and phonetic similarity, and finally categorised and selectively combined into a reconstruction hypothesis. This method can be used for various applications in language technology, e.g., adaptation for ASR, document production, or generally for the development of parallel text corpora of non-literal text resources. In an experimental evaluation, which also includes an assessment of the quality of the reconstructed transcripts compared to manual transcriptions, the described method results in a relative word error rate reduction of 7.74% after retraining the standard language model with reconstructed transcripts.