Multi-dimensional classification of biomedical text

Authors:
Hagit Shatkay;Fengxia Pan;Andrey Rzhetsky;W. John Wilbur
Affiliations:
-;-;-;-
Venue:
Bioinformatics
Year:
2008

Citing 0
Cited 12

Semantic annotation of papers: interface & enrichment tool (SAPIENT)

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Zones of conceptualisation in scientific papers: a window to negative and speculative statements

NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Speculation and negation annotation in natural language texts: what the case of BioScope might (not) reveal

NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Evaluating a meta-knowledge annotation scheme for bio-events

NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Identifying the information structure of scientific abstracts: an investigation of three different schemes

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Detecting hedge cues and their scope in biomedical text with conditional random fields

Journal of Biomedical Informatics
Mining methodologies from NLP publications: A case study in automatic terminology recognition

Computer Speech and Language
A weakly-supervised approach to argumentative zoning of scientific documents

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Modality and negation: An introduction to the special issue

Computational Linguistics
Are you sure that this happened? assessing the factuality degree of events in text

Computational Linguistics
Cross-genre and cross-domain detection of semantic uncertainty

Computational Linguistics
A three-way perspective on scientific discourse annotation for knowledge extraction

ACL '12 Proceedings of the Workshop on Detecting Structure in Scholarly Discourse

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice. Contact: shatkay@cs.queensu.ca