C4.5: programs for machine learning
C4.5: programs for machine learning
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Information Retrieval
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
YALE: rapid prototyping for complex data mining tasks
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts
BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Learning the scope of hedge cues in biomedical texts
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
A shared task involving multi-label classification of clinical free text
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
A metalearning approach to processing the scope of negation
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Learning the scope of negation in biomedical texts
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
A hybrid approach to finding negated and uncertain expressions in biomedical documents
Proceedings of the 2nd international workshop on Managing interoperability and compleXity in health systems
Hi-index | 0.00 |
Detecting negative and speculative information is essential in most biomedical text-mining tasks where these language forms are used to express impressions, hypotheses, or explanations of experimental results. Our research is focused on developing a system based on machine-learning techniques that identifies negation and speculation signals and their scope in clinical texts. The proposed system works in two consecutive phases: first, a classifier decides whether each token in a sentence is a negation/speculation signal or not. Then another classifier determines, at sentence level, the tokens which are affected by the signals previously identified. The system was trained and evaluated on the clinical texts of the BioScope corpus, a freely available resource consisting of medical and biological texts: full-length articles, scientific abstracts, and clinical reports. The results obtained by our system were compared with those of two different systems, one based on regular expressions and the other based on machine learning. Our system's results outperformed the results obtained by these two systems. In the signal detection task, the F-score value was 97.3% in negation and 94.9% in speculation. In the scope-finding task, a token was correctly classified if it had been properly identified as being inside or outside the scope of all the negation signals present in the sentence. Our proposal showed an F score of 93.2% in negation and 80.9% in speculation. Additionally, the percentage of correct scopes (those with all their tokens correctly classified) was evaluated obtaining F scores of 90.9% in negation and 71.9% in speculation. © 2012 Wiley Periodicals, Inc.