A machine-learning approach to negation and speculation detection in clinical texts

Authors:
Noa P. Cruz Díaz;Manuel J. Maña López;Jacinto Mata Vázquez;Victoria Pachón Álvarez
Affiliations:
Department of Information Technology, University of Huelva, Huelva, Spain;Department of Information Technology, University of Huelva, Huelva, Spain;Department of Information Technology, University of Huelva, Huelva, Spain;Department of Information Technology, University of Huelva, Huelva, Spain
Venue:
Journal of the American Society for Information Science and Technology
Year:
2012

Citing 14
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Learning the scope of hedge cues in biomedical texts

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
A shared task involving multi-label classification of clinical free text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems

Applied Soft Computing
A metalearning approach to processing the scope of negation

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Learning the scope of negation in biomedical texts

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
What's great and what's not: learning to classify the scope of negation for improved sentiment analysis

NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

A hybrid approach to finding negated and uncertain expressions in biomedical documents

Proceedings of the 2nd international workshop on Managing interoperability and compleXity in health systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Detecting negative and speculative information is essential in most biomedical text-mining tasks where these language forms are used to express impressions, hypotheses, or explanations of experimental results. Our research is focused on developing a system based on machine-learning techniques that identifies negation and speculation signals and their scope in clinical texts. The proposed system works in two consecutive phases: first, a classifier decides whether each token in a sentence is a negation/speculation signal or not. Then another classifier determines, at sentence level, the tokens which are affected by the signals previously identified. The system was trained and evaluated on the clinical texts of the BioScope corpus, a freely available resource consisting of medical and biological texts: full-length articles, scientific abstracts, and clinical reports. The results obtained by our system were compared with those of two different systems, one based on regular expressions and the other based on machine learning. Our system's results outperformed the results obtained by these two systems. In the signal detection task, the F-score value was 97.3% in negation and 94.9% in speculation. In the scope-finding task, a token was correctly classified if it had been properly identified as being inside or outside the scope of all the negation signals present in the sentence. Our proposal showed an F score of 93.2% in negation and 80.9% in speculation. Additionally, the percentage of correct scopes (those with all their tokens correctly classified) was evaluated obtaining F scores of 90.9% in negation and 71.9% in speculation. © 2012 Wiley Periodicals, Inc.