Mining protein interactions from text using convolution kernels

Authors:
Ramanathan Narayanan;Sanchit Misra;Simon Lin;Alok Choudhary
Affiliations:
Department of Electrical Engineering and Computer Science, Northwestern University;Department of Electrical Engineering and Computer Science, Northwestern University;Feinberg School of Medicine, Northwestern University;Department of Electrical Engineering and Computer Science, Northwestern University
Venue:
PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Year:
2009

Citing 10
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Making large-scale support vector machine learning practical

Advances in kernel methods
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Biomedical named entity recognition using two-phase model based on SVMs

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Online Predicted Human Interaction Database

Bioinformatics
Head-Driven Statistical Models for Natural Language Parsing

Computational Linguistics
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Multi-way relation classification: application to protein-protein interactions

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the sizes of biomedical literature databases increase, there is an urgent need to develop intelligent systems that automatically discover Protein-Protein interactions from text. Despite resource-intensive efforts to create manually curated interaction databases, the sheer volume of biological literature databases makes it impossible to achieve significant coverage. In this paper, we describe a scalable hierarchical Support Vector Machine(SVM) based framework to efficiently mine protein interactions with high precision. In addition, we describe a convolution tree-vector kernel based on syntactic similarity of natural language text to further enhance the mining process. By using the inherent syntactic similarity of interaction phrases as a kernel method, we are able to significantly improve the classification quality. Our hierarchical framework allows us to reduce the search space dramatically with each stage, while sustaining a high level of accuracy. We test our framework on a corpus of over 10000 manually annotated phrases gathered from various sources. The convolution kernel technique identifies sentences describing interactions with a precision of 95% and a recall of 92%, yielding significant improvements over previous machine learning techniques.