Revealing the structure of medical dictations with conditional random fields

Authors:
Jeremy Jancsary;Johannes Matiasek;Harald Trost
Affiliations:
Austrian Research Institute for Artificial Intelligence, Vienna, Freyung;Austrian Research Institute for Artificial Intelligence, Vienna, Freyung;Medical University Vienna, Austria
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 12
Cited 2

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A critique and improvement of an evaluation metric for text segmentation

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Understanding belief propagation and its generalizations

Exploring artificial intelligence in the new millennium
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Advances in domain independent linear text segmentation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Learning to Parse Hierarchical Lists and Outlines Using Conditional Random Fields

IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
On the uniqueness of loopy belief propagation fixed points

Neural Computation
Composition of conditional random fields for transfer learning

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Flexible text segmentation with structured multilabel classification

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
TOWARD A MORE GLOBAL AND COHERENT SEGMENTATION OF TEXTS

Applied Artificial Intelligence
Tree-based reparameterization framework for analysis of sum-product and related algorithms

IEEE Transactions on Information Theory

Identifying segment topics in medical dictations

SRSL '09 Proceedings of the 2nd Workshop on Semantic Representation of Spoken Language
Getting more from segmentation evaluation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic processing of medical dictations poses a significant challenge. We approach the problem by introducing a statistical framework capable of identifying types and boundaries of sections, lists and other structures occurring in a dictation, thereby gaining explicit knowledge about the function of such elements. Training data is created semi-automatically by aligning a parallel corpus of corrected medical reports and corresponding transcripts generated via automatic speech recognition. We highlight the properties of our statistical framework, which is based on conditional random fields (CRFs) and implemented as an efficient, publicly available toolkit. Finally, we show that our approach is effective both under ideal conditions and for real-life dictation involving speech recognition errors and speech-related phenomena such as hesitation and repetitions.