A machine-learning approach to negation and speculation detection in clinical texts

  • Authors:
  • Noa P. Cruz Díaz;Manuel J. Maña López;Jacinto Mata Vázquez;Victoria Pachón Álvarez

  • Affiliations:
  • Department of Information Technology, University of Huelva, Huelva, Spain;Department of Information Technology, University of Huelva, Huelva, Spain;Department of Information Technology, University of Huelva, Huelva, Spain;Department of Information Technology, University of Huelva, Huelva, Spain

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Detecting negative and speculative information is essential in most biomedical text-mining tasks where these language forms are used to express impressions, hypotheses, or explanations of experimental results. Our research is focused on developing a system based on machine-learning techniques that identifies negation and speculation signals and their scope in clinical texts. The proposed system works in two consecutive phases: first, a classifier decides whether each token in a sentence is a negation/speculation signal or not. Then another classifier determines, at sentence level, the tokens which are affected by the signals previously identified. The system was trained and evaluated on the clinical texts of the BioScope corpus, a freely available resource consisting of medical and biological texts: full-length articles, scientific abstracts, and clinical reports. The results obtained by our system were compared with those of two different systems, one based on regular expressions and the other based on machine learning. Our system's results outperformed the results obtained by these two systems. In the signal detection task, the F-score value was 97.3% in negation and 94.9% in speculation. In the scope-finding task, a token was correctly classified if it had been properly identified as being inside or outside the scope of all the negation signals present in the sentence. Our proposal showed an F score of 93.2% in negation and 80.9% in speculation. Additionally, the percentage of correct scopes (those with all their tokens correctly classified) was evaluated obtaining F scores of 90.9% in negation and 71.9% in speculation. © 2012 Wiley Periodicals, Inc.