Exploring hedge identification in biomedical literature

  • Authors:
  • Ben Medlock

  • Affiliations:
  • University of Cambridge, Computer Laboratory, William Gates Building, 15 JJ Thomson Avenue, Cambridge CB3 OFD, UK

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate automatic identification of speculative language, or 'hedging', in scientific literature from the biomedical domain. Our contributions include a precise description of the task including annotation guidelines, theoretical analysis and discussion. We show that good agreement can be achieved using our guidelines and present a publicly available benchmark dataset for the task. We argue for separation of the acquisition and classification phases in semi-supervised machine learning, and present a probabilistic acquisition model which is evaluated both theoretically and experimentally. We explore the impact of different sample representations on classification accuracy across the learning curve and demonstrate the effectiveness of using machine learning for the hedge identification task. Finally, we examine the errors made by our approach and point toward avenues for future research.