HedgeHunter: a system for hedge detection and uncertainty classification

Authors:
David Clausen
Affiliations:
Stanford University, Stanford, CA
Venue:
CoNLL '10: Shared Task Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
Year:
2010

Citing 6
Cited 1

Recognizing speculative language in biomedical research articles: a linguistically motivated perspective

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Overview of BioNLP'09 shared task on event extraction

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Learning the scope of hedge cues in biomedical texts

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Using hedges to enhance a disease outbreak report text mining system

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
TEXT2TABLE: medical text summarization system based on named entity recognition and modality identification

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text

CoNLL '10: Shared Task Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task

Cross-genre and cross-domain detection of semantic uncertainty

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the dramatic growth of scientific publishing, Information Extraction (IE) systems are becoming an increasingly important tool for large scale data analysis. Hedge detection and uncertainty classification are important components of a high precision IE system. This paper describes a two part supervised system which classifies words as hedge or non-hedged and sentences as certain or uncertain in biomedical and Wikipedia data. In the first stage, our system trains a logistic regression classifier to detect hedges based on lexical and Part-of-Speech collocation features. In the second stage, we use the output of the hedge classifier to generate sentence level features based on the number of hedge cues, the identity of hedge cues, and a Bag-of-Words feature vector to train a logistic regression classifier for sentence level uncertainty. With the resulting classification, an IE system can then discard facts and relations extracted from these sentences or treat them as appropriately doubtful. We present results for in domain training and testing and cross domain training and testing based on a simple union of training sets.