An Image-Text Approach for Extracting Experimental Evidence of Protein-Protein Interactions in the Biomedical Literature

Authors:
Luis D. Lopez;Jingyi Yu;Cecilia N. Arighi;Manabu Torii;K. Vijay-Shanker;Hongzhan Huang;Cathy H. Wu
Affiliations:
Dept. of Computer and Information Sciences, University of Delaware, Newark, DE 19716 United States;Dept. of Computer and Information Sciences, University of Delaware, Newark, DE 19716 United States;Dept. of Computer and Information Sciences University of Delaware Newark, DE 19716 United States and Center for Bioinformatics and Computational Biology, University of Delaware Newark, DE 19711 Un ...;Dept. of Computer and Information Sciences University of Delaware Newark, DE 19716 United States and Center for Bioinformatics and Computational Biology, University of Delaware Newark, DE 19711 Un ...;Dept. of Computer and Information Sciences, University of Delaware Newark, DE 19716 United States;Dept. of Computer and Information Sciences, University of Delaware Newark, DE 19716 United States and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711 ...;Dept. of Computer and Information Sciences, University of Delaware Newark, DE 19716 United States and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711 ...
Venue:
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Year:
2013

Citing 11
Cited 0

An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Information Retrieval

Information Retrieval
Digital Image Processing

Digital Image Processing
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Searching Online Journals for Fluorescence Microscope Images Depicting Protein Subcellular Location Patterns

BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering
Use of Figures in Literature Mining for Biomedical Digital Libraries

DIAL '06 Proceedings of the Second International Conference on Document Image Analysis for Libraries
BioThesaurus: a web-based thesaurus of protein and gene names

Bioinformatics
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents

BIBM '11 Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine
Automatic figure classification in bioscience literature

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Proteins are complex biological polymers that mediate virtually all cellular functions. Typically these functions are modulated by protein-protein interactions (PPI). Tremendous efforts have been made by life scientists to detect PPIs through different experimental approaches and document the results through publications. On the informatics front, however, there lacks an effective means for retrieving PPI information from published literatures. In this work we present a novel framework for identifying experimental methods employed for analyzing PPI from biomedical articles. Different from state-of-the-art approaches based only on text, we explore using the combination of attributes from figures, figure captions, and text within figures for identifying PPI experimental methods. Our work is motivated by the observation that biomedical figures often constitute direct evidence of experimental results and therefore provide complementary information to texts. We start with automatically extracting unimodal panels (subfigures) and their associated subcaptions and then classifying the subfigure into different types using a proposed hierarchical image taxonomy. Next, we combine the subfigure types with text-based features to form a hybrid feature descriptor and use it for PPI method classification. We further construct a dataset starting from a set of 2,256 documents provided by the molecular interaction database MINT. Here we show that our new approach outperforms the text-only solution for associating figures with PPI methods.