Applying semantic-based probabilistic context-free grammar to medical language processing - A preliminary study on parsing medication sentences

Authors:
Hua Xu;Samir AbdelRahman;Yanxin Lu;Joshua C. Denny;Son Doan
Affiliations:
Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN, USA;Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN, USA;National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention, Shanghai, China;Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN, USA and Department of Medicine, Vanderbilt University, School of Medicine, Nashville, TN, USA;National Institute of Informatics, Tokyo, Japan
Venue:
Journal of Biomedical Informatics
Year:
2011

Citing 10
Cited 0

Medical Language Processing: Computer Management of Narrative Data

Medical Language Processing: Computer Management of Narrative Data
The structure of science information

Journal of Biomedical Informatics - Special issue: Sublanguage
Two biomedical sublanguages: a description based on the theories of Zellig Harris

Journal of Biomedical Informatics - Special issue: Sublanguage
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text

Journal of Biomedical Informatics - Special issue: Unified medical language system
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Methodological Review: What can natural language processing do for clinical decision support?

Journal of Biomedical Informatics
ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently result in two or more parse trees. One possible solution, which has not been extensively explored previously, is to augment productions in medical sublanguage grammars with probabilities to resolve the ambiguity. In this study, we associated probabilities with production rules in a semantic-based grammar for medication findings and evaluated its performance on reducing parsing ambiguity. Using the existing data set from 2009 i2b2 NLP (Natural Language Processing) challenge for medication extraction, we developed a semantic-based CFG (Context Free Grammar) for parsing medication sentences and manually created a Treebank of 4564 medication sentences from discharge summaries. Using the Treebank, we derived a semantic-based PCFG (Probabilistic Context Free Grammar) for parsing medication sentences. Our evaluation using a 10-fold cross validation showed that the PCFG parser dramatically improved parsing performance when compared to the CFG parser.