The nature of statistical learning theory
The nature of statistical learning theory
Making large-scale support vector machine learning practical
Advances in kernel methods
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A new statistical parser based on bigram lexical dependencies
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Biomedical named entity recognition using two-phase model based on SVMs
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Online Predicted Human Interaction Database
Bioinformatics
Head-Driven Statistical Models for Natural Language Parsing
Computational Linguistics
Dependency tree kernels for relation extraction
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Multi-way relation classification: application to protein-protein interactions
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
As the sizes of biomedical literature databases increase, there is an urgent need to develop intelligent systems that automatically discover Protein-Protein interactions from text. Despite resource-intensive efforts to create manually curated interaction databases, the sheer volume of biological literature databases makes it impossible to achieve significant coverage. In this paper, we describe a scalable hierarchical Support Vector Machine(SVM) based framework to efficiently mine protein interactions with high precision. In addition, we describe a convolution tree-vector kernel based on syntactic similarity of natural language text to further enhance the mining process. By using the inherent syntactic similarity of interaction phrases as a kernel method, we are able to significantly improve the classification quality. Our hierarchical framework allows us to reduce the search space dramatically with each stage, while sustaining a high level of accuracy. We test our framework on a corpus of over 10000 manually annotated phrases gathered from various sources. The convolution kernel technique identifies sentences describing interactions with a precision of 95% and a recall of 92%, yielding significant improvements over previous machine learning techniques.