Figure classification in biomedical literature to elucidate disease mechanisms, based on pathways

  • Authors:
  • Natsu Ishii;Asako Koike;Yasunori Yamamoto;Toshihisa Takagi

  • Affiliations:
  • Department of Computational Biology, Graduate School of Frontier Science, The University of Tokyo, 5-1-5 Kashiwano-ha, Kashiwa, Chiba 277-8568, Japan;Central Research Laboratory, Hitachi Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo 185-8601, Japan;Database Center for Life Science, Research Organization of Information and Systems, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan;Department of Computational Biology, Graduate School of Frontier Science, The University of Tokyo, 5-1-5 Kashiwano-ha, Kashiwa, Chiba 277-8568, Japan and Database Center for Life Science, Research ...

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Objective: As more full-text biomedical papers are becoming available in digitized form online, there is a need for tools to mine information from all parts of such papers. Because the figures and legends/captions in biomedical papers provide important information about research outcomes, mining techniques targeting them have attracted a great deal of attention. In this study, we focused on pathway figures that illustrate signaling or metabolic pathways, because many of these are important in understanding disease mechanism(s). We developed a figure classification system based on textual information contained in biomedical papers to provide an automated acquisition system for such pathway figures. Materials and methods: We used full-text journal articles available on PubMed Central as our data set. We used several supervised machine learning methods, such as decision tree and a support vector machine, to classify figures in the data set. We compared the classification performance among the cases using only figure legends, using only sentences referring to the figure in the main text of the article, and combining figure legends with sentences referring to the figure in the main text of the article. Results: Compared with previous related work, a sufficiently high performance was achieved with the figure legends alone. The performance with the sentences referring to the figure in the main text was actually lower than that with the figure legends alone, indicating that focusing on the main text alone is inadequate. The combination of legend and main text clearly had an effect, but including the prior and following sentences in addition to the sentence referring to the figure dramatically improved the performance. Conclusions: We developed an automatic pathway figure classification system based on both figure legends and the main text that has quite a high degree of accuracy. To our knowledge, this is the first attempt to address a figure classification task using legends and the main text, and it may provide a first stage for achieving efficient figure mining.