Deriving a probabilistic syntacto-semantic grammar for biomedicine based on domain-specific terminologies

Authors:
Jung-Wei Fan;Carol Friedman
Affiliations:
Department of Biomedical Informatics, Columbia University, New York, NY, USA and Systems Solutions and Deployment, Kaiser Permanente Southern California, Pasadena, CA, USA;Department of Biomedical Informatics, Columbia University, New York, NY, USA
Venue:
Journal of Biomedical Informatics
Year:
2011

Citing 15
Cited 0

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Knowledge-based Processing of Medical Language: A Language Engineering Approach

GWAI '92 Proceedings of the 16th German Conference on Artificial Intelligence: Advances in Artificial Intelligence
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Automated processing of medical English

COLING '69 Proceedings of the 1969 conference on Computational linguistics
Towards a semantic lexicon for biological language processing: Conference Papers

Comparative and Functional Genomics
MedPost: a part-of-speech tagger for bioMedical text

Bioinformatics
Head-Driven Statistical Models for Natural Language Parsing

Computational Linguistics
MPLUS: a probabilistic medical language understanding system

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
The importance of syntactic parsing and inference in semantic role labeling

Computational Linguistics
Evaluating contributions of natural language parsers to protein–protein interaction extraction

Bioinformatics
Towards identifying intervention arms in randomized controlled trials: Extracting coordinating constructions

Journal of Biomedical Informatics
A statistical semantic parser that integrates syntax and semantics

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Parsing biomedical literature

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
A Field Theoretical Approach to Medical Natural Language Processing

IEEE Transactions on Information Technology in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biomedical natural language processing (BioNLP) is a useful technique that unlocks valuable information stored in textual data for practice and/or research. Syntactic parsing is a critical component of BioNLP applications that rely on correctly determining the sentence and phrase structure of free text. In addition to dealing with the vast amount of domain-specific terms, a robust biomedical parser needs to model the semantic grammar to obtain viable syntactic structures. With either a rule-based or corpus-based approach, the grammar engineering process requires substantial time and knowledge from experts, and does not always yield a semantically transferable grammar. To reduce the human effort and to promote semantic transferability, we propose an automated method for deriving a probabilistic grammar based on a training corpus consisting of concept strings and semantic classes from the Unified Medical Language System (UMLS), a comprehensive terminology resource widely used by the community. The grammar is designed to specify noun phrases only due to the nominal nature of the majority of biomedical terminological concepts. Evaluated on manually parsed clinical notes, the derived grammar achieved a recall of 0.644, precision of 0.737, and average cross-bracketing of 0.61, which demonstrated better performance than a control grammar with the semantic information removed. Error analysis revealed shortcomings that could be addressed to improve performance. The results indicated the feasibility of an approach which automatically incorporates terminology semantics in the building of an operational grammar. Although the current performance of the unsupervised solution does not adequately replace manual engineering, we believe once the performance issues are addressed, it could serve as an aide in a semi-supervised solution.