Detecting hedge cues and their scope in biomedical text with conditional random fields

Authors:
Shashank Agarwal;Hong Yu
Affiliations:
Medical Informatics, University of Wisconsin-Milwaukee, Milwaukee, WI, USA;Department of Health Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI, USA and Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
Venue:
Journal of Biomedical Informatics
Year:
2010

Citing 16
Cited 1

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Contextual word similarity and estimation from sparse data

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text

Bioinformatics
Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians

Journal of Biomedical Informatics
Exploring hedge identification in biomedical literature

Journal of Biomedical Informatics
Multi-dimensional classification of biomedical text

Bioinformatics
Integrating co-occurrence statistics with information extraction for robust retrieval of protein interactions from Medline

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Exploring text and image features to classify images in bioscience literature

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Learning the scope of hedge cues in biomedical texts

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
ConText: an algorithm for identifying contextual features from clinical text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
An empirical comparison of pattern recognition, neural nets, and machine learning classification methods

IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

Bioinformatics
Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

Bioinformatics

Corpus-based approaches to processing the scope of negation cues: an evaluation of the state of the art

IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Hedging is frequently used in both the biological literature and clinical notes to denote uncertainty or speculation. It is important for text-mining applications to detect hedge cues and their scope; otherwise, uncertain events are incorrectly identified as factual events. However, due to the complexity of language, identifying hedge cues and their scope in a sentence is not a trivial task. Our objective was to develop an algorithm that would automatically detect hedge cues and their scope in biomedical literature. Methodology: We used conditional random fields (CRFs), a supervised machine-learning algorithm, to train models to detect hedge cue phrases and their scope in biomedical literature. The models were trained on the publicly available BioScope corpus. We evaluated the performance of the CRF models in identifying hedge cue phrases and their scope by calculating recall, precision and F1-score. We compared our models with three competitive baseline systems. Results: Our best CRF-based model performed statistically better than the baseline systems, achieving an F1-score of 88% and 86% in detecting hedge cue phrases and their scope in biological literature and an F1-score of 93% and 90% in detecting hedge cue phrases and their scope in clinical notes. Conclusions: Our approach is robust, as it can identify hedge cues and their scope in both biological and clinical text. To benefit text-mining applications, our system is publicly available as a Java API and as an online application at http://hedgescope.askhermes.org. To our knowledge, this is the first publicly available system to detect hedge cues and their scope in biomedical literature.