Semi-supervised Prediction of Protein Interaction Sentences Exploiting Semantically Encoded Metrics

Authors:
Tamara Polajnar;Mark Girolami
Affiliations:
University of Glasgow, Glasgow, Scotland G12 8QQ;University of Glasgow, Glasgow, Scotland G12 8QQ
Venue:
PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Year:
2009

Citing 12
Cited 0

Making large-scale support vector machine learning practical

Advances in kernel methods
Discovering information flow suing high dimensional conceptual space

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Probabilistic hyperspace analogue to language

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Variational Bayesian multinomial probit regression with Gaussian process priors

Neural Computation
Dependency-Based Construction of Semantic Space Models

Computational Linguistics
Wikipedia-Based Kernels for Text Categorization

SYNASC '07 Proceedings of the Ninth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
A graph kernel for protein-protein interaction extraction

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Supervised classification for extracting biomedical events

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Corpus design for biomedical natural language processing

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process.