Multiple kernel learning in protein-protein interaction extraction from biomedical literature

Authors:
Zhihao Yang;Nan Tang;Xiao Zhang;Hongfei Lin;Yanpeng Li;Zhiwei Yang
Affiliations:
Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China;Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China;Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China;Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China;Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China;Department of Ultrasound, Oil Field Hospital of Daqing, Heilongjiang 163001, China
Venue:
Artificial Intelligence in Medicine
Year:
2011

Citing 14
Cited 1

An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
The Frame-Based Module of the SUISEKI Information Extraction System

IEEE Intelligent Systems
BioRAT: extracting biological information from full-length papers

Bioinformatics
A study on convolution kernels for shallow semantic parsing

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A composite kernel to extract relations between entities with both flat and structured features

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A shortest path dependency kernel for relation extraction

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
RelEx---Relation extraction using dependency parse trees

Bioinformatics
Kernel approaches for genic interaction extraction

Bioinformatics
Ontology-Based Protein-Protein Interactions Extraction from Literature Using the Hidden Vector State Model

ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Evaluating contributions of natural language parsers to protein–protein interaction extraction

Bioinformatics
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine
BioPPISVMExtractor: A protein-protein interaction extractor for biomedical literature using SVM and rich feature sets

Journal of Biomedical Informatics
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Extracting protein-protein interactions from the literature using the hidden vector state model

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II

Hash Subgraph Pairwise Kernel for Protein-Protein Interaction Extraction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. The volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database administrators, responsible for content input and maintenance to detect and manually update protein interaction information. The objective of this work is to develop an effective approach to automatic extraction of PPI information from biomedical literature. Methods and materials: We present a weighted multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, graph and part-of-speech (POS) path. In particular, we extend the shortest path-enclosed tree (SPT) and dependency path tree to capture richer contextual information. Results: Our experimental results show that the combination of SPT and dependency path tree extensions contributes to the improvement of performance by almost 0.7 percentage units in F-score and 2 percentage units in area under the receiver operating characteristics curve (AUC). Combining two or more appropriately weighed individual will further improve the performance. Both on the individual corpus and cross-corpus evaluation our combined kernel can achieve state-of-the-art performance with respect to comparable evaluations, with 64.41% F-score and 88.46% AUC on the AImed corpus. Conclusions: As different kernels calculate the similarity between two sentences from different aspects. Our combined kernel can reduce the risk of missing important features. More specifically, we use a weighted linear combination of individual kernels instead of assigning the same weight to each individual kernel, thus allowing the introduction of each kernel to incrementally contribute to the performance improvement. In addition, SPT and dependency path tree extensions can improve the performance by including richer context information.