Figure classification in biomedical literature to elucidate disease mechanisms, based on pathways

Authors:
Natsu Ishii;Asako Koike;Yasunori Yamamoto;Toshihisa Takagi
Affiliations:
Department of Computational Biology, Graduate School of Frontier Science, The University of Tokyo, 5-1-5 Kashiwano-ha, Kashiwa, Chiba 277-8568, Japan;Central Research Laboratory, Hitachi Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo 185-8601, Japan;Database Center for Life Science, Research Organization of Information and Systems, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan;Department of Computational Biology, Graduate School of Frontier Science, The University of Tokyo, 5-1-5 Kashiwano-ha, Kashiwa, Chiba 277-8568, Japan and Database Center for Life Science, Research ...
Venue:
Artificial Intelligence in Medicine
Year:
2010

Citing 16
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A vector space model for automatic indexing

Communications of the ACM
Random Forests

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Extraction, layout analysis and classification of diagrams in PDF documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
FigSearch: a figure legend indexing and classification system

Bioinformatics
Improvements to Platt's SMO Algorithm for SVM Classifier Design

Neural Computation
Integrating image data into biomedical text categorization

Bioinformatics
Substring selection for biomedical document classification

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A review of feature selection techniques in bioinformatics

Bioinformatics
Improved recognition of figures containing fluorescence microscope images in online journal articles using graphical models

Bioinformatics
Exploring text and image features to classify images in bioscience literature

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis

Guest editorial: Data mining for the study of disease genes and proteins

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: As more full-text biomedical papers are becoming available in digitized form online, there is a need for tools to mine information from all parts of such papers. Because the figures and legends/captions in biomedical papers provide important information about research outcomes, mining techniques targeting them have attracted a great deal of attention. In this study, we focused on pathway figures that illustrate signaling or metabolic pathways, because many of these are important in understanding disease mechanism(s). We developed a figure classification system based on textual information contained in biomedical papers to provide an automated acquisition system for such pathway figures. Materials and methods: We used full-text journal articles available on PubMed Central as our data set. We used several supervised machine learning methods, such as decision tree and a support vector machine, to classify figures in the data set. We compared the classification performance among the cases using only figure legends, using only sentences referring to the figure in the main text of the article, and combining figure legends with sentences referring to the figure in the main text of the article. Results: Compared with previous related work, a sufficiently high performance was achieved with the figure legends alone. The performance with the sentences referring to the figure in the main text was actually lower than that with the figure legends alone, indicating that focusing on the main text alone is inadequate. The combination of legend and main text clearly had an effect, but including the prior and following sentences in addition to the sentence referring to the figure dramatically improved the performance. Conclusions: We developed an automatic pathway figure classification system based on both figure legends and the main text that has quite a high degree of accuracy. To our knowledge, this is the first attempt to address a figure classification task using legends and the main text, and it may provide a first stage for achieving efficient figure mining.