Mixture of logistic models and an ensemble approach for protein-protein interaction extraction

Authors:
Lindsey Bell;Jinfeng Zhang;Xufeng Niu
Affiliations:
Florida State University, Tallahassee, FL;Florida State University, Tallahassee, FL;Florida State University, Tallahassee, FL
Venue:
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Year:
2011

Citing 12
Cited 0

Using analytic QP and sparseness to speed training of support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Discovering patterns to extract protein--protein interactions from full texts

Bioinformatics
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
RelEx---Relation extraction using dependency parse trees

Bioinformatics
Manual curation is not sufficient for annotation of genomic databases

Bioinformatics
Extracting Protein-Protein Interactions from MEDLINE using the Hidden Vector State model

International Journal of Bioinformatics Research and Applications
Bayesian inference of protein–protein interactions from biological literature

Bioinformatics
A graph kernel for protein-protein interaction extraction

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
A hybrid approach to extract protein–protein interactions

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic extraction of protein-protein interaction (PPI) information from scientific literature is important for building PPI databases, studying biological networks and discovering new biological knowledge through automatic hypothesis generation. In this paper, we present a new method for PPI extraction based on a mixture of logistic models. The method automatically clusters interaction words (words that describe the interactions of protein pairs) into groups with similar grammatical properties. Logistic models are fitted for each cluster of interaction words. Directionality of interactions is an essential piece of information for many protein interactions and important for building directed biological networks. Most of current PPI extraction methods do not extract the directional information of interactions. This is in part due to the lack of specific corpora with directionality information annotated. We introduce a new corpus, PICAD, for evaluating PPI extraction tools that includes directional annotation. The corpus is available at http://stat.fsu.edu/~jinfeng/resources/PICAD.txt. In addition, we propose an ensemble approach using logistic regression, Bayesian Networks, and SVM for identifying PPIs. We show that using an ensemble of classifiers allows us to capture different features in the text and report an F-measure of 75.7% using our new corpus.