A machine learning approach for the curation of biomedical literature

Authors:
Min Shi;David S. Edwin;Rakesh Menon;Lixiang Shen;Jonathan Y. K. Lim;Han Tong Loh;S. Sathiya Keerthi;Chong Jin Ong
Affiliations:
Design Technology Institute Ltd, Faculty of Engineering, National University of Singapore, Singapore;Design Technology Institute Ltd, Faculty of Engineering, National University of Singapore, Singapore;Design Technology Institute Ltd, Faculty of Engineering, National University of Singapore, Singapore;Design Technology Institute Ltd, Faculty of Engineering, National University of Singapore, Singapore;Design Technology Institute Ltd, Faculty of Engineering, National University of Singapore, Singapore;Design Technology Institute Ltd, Faculty of Engineering, National University of Singapore, Singapore and ME Department, National University of Singapore, Singapore;ME Department, National University of Singapore, Singapore;ME Department, National University of Singapore, Singapore
Venue:
ECIR'03 Proceedings of the 25th European conference on IR research
Year:
2003

Citing 3
Cited 0

Machine learning of rules and trees

Machine learning, neural and statistical classification
Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the field of the biomedical sciences there exists a vast repository of information located within large quantities of research papers. Very often, researchers need to spend considerable amounts of time reading through entire papers before being able to determine whether or not they should be curated (archived). In this paper, we present an automated text classification system for the classification of biomedical papers. This classification is based on whether there is experimental evidence for the expression of molecular gene products for specified genes within a given paper. The system performs pre-processing and data cleaning, followed by feature extraction from the raw text. It subsequently classifies the paper using the extracted features with a Naïve Bayes Classifier. Our approach has made it possible to classify (and curate) biomedical papers automatically, thus potentially saving considerable time and resources. The system proved to be highly accurate, and won honourable mention in the KDD Cup 2002 task 1.