Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Understanding belief propagation and its generalizations
Exploring artificial intelligence in the new millennium
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Learning to Parse Hierarchical Lists and Outlines Using Conditional Random Fields
IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
On the uniqueness of loopy belief propagation fixed points
Neural Computation
Composition of conditional random fields for transfer learning
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Flexible text segmentation with structured multilabel classification
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
TOWARD A MORE GLOBAL AND COHERENT SEGMENTATION OF TEXTS
Applied Artificial Intelligence
Tree-based reparameterization framework for analysis of sum-product and related algorithms
IEEE Transactions on Information Theory
Identifying segment topics in medical dictations
SRSL '09 Proceedings of the 2nd Workshop on Semantic Representation of Spoken Language
Getting more from segmentation evaluation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
Automatic processing of medical dictations poses a significant challenge. We approach the problem by introducing a statistical framework capable of identifying types and boundaries of sections, lists and other structures occurring in a dictation, thereby gaining explicit knowledge about the function of such elements. Training data is created semi-automatically by aligning a parallel corpus of corrected medical reports and corresponding transcripts generated via automatic speech recognition. We highlight the properties of our statistical framework, which is based on conditional random fields (CRFs) and implemented as an efficient, publicly available toolkit. Finally, we show that our approach is effective both under ideal conditions and for real-life dictation involving speech recognition errors and speech-related phenomena such as hesitation and repetitions.