A rich feature vector for protein-protein interaction extraction from multiple corpora

Authors:
Makoto Miwa;Rune Sætre;Yusuke Miyao;Jun'ichi Tsujii
Affiliations:
The University of Tokyo, Bunkyo-ku, Tokyo, Japan;The University of Tokyo, Bunkyo-ku, Tokyo, Japan;The University of Tokyo, Bunkyo-ku, Tokyo, Japan;The University of Tokyo, Bunkyo-ku, Tokyo, Japan and University of Manchester, UK and National Center for Text Mining, UK
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Year:
2009

Citing 13
Cited 13

Soft Margins for AdaBoost

Machine Learning
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
RelEx---Relation extraction using dependency parse trees

Bioinformatics
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
Cross-domain video concept detection using adaptive svms

Proceedings of the 15th international conference on Multimedia
Kernel approaches for genic interaction extraction

Bioinformatics
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Applying alternating structure optimization to word sense disambiguation

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine

Extracting Protein Interactions from Text with the Unified AkaneRE Event Extraction System

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Integration of static relations to enhance event extraction from text

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Simplicity is better: revisiting single kernel PPI extraction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Semi-supervised abstraction-augmented string kernel for multi-level bio-relation extraction

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
A Framework for Semisupervised Feature Generation and Its Applications in Biomedical Literature Mining

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Dependency-driven feature-based learning for extracting protein-protein interactions from biomedical text

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Not all links are equal: exploiting dependency types for the extraction of protein-protein interactions from text

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Neighborhood hash graph kernel for protein-protein interaction extraction

Journal of Biomedical Informatics
Tree kernel-based protein-protein interaction extraction from biomedical literature

Journal of Biomedical Informatics
Combining tree structures, flat features and patterns for biomedical relation extraction

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Improving distantly supervised extraction of drug-drug and protein-protein interactions

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
High precision rule based PPI extraction and per-pair basis performance evaluation

Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics
Learning bayesian network using parse trees for extraction of protein-protein interaction

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because of the importance of protein-protein interaction (PPI) extraction from text, many corpora have been proposed with slightly differing definitions of proteins and PPI. Since no single corpus is large enough to saturate a machine learning system, it is necessary to learn from multiple different corpora. In this paper, we propose a solution to this challenge. We designed a rich feature vector, and we applied a support vector machine modified for corpus weighting (SVM-CW) to complete the task of multiple corpora PPI extraction. The rich feature vector, made from multiple useful kernels, is used to express the important information for PPI extraction, and the system with our feature vector was shown to be both faster and more accurate than the original kernel-based system, even when using just a single corpus. SVM-CW learns from one corpus, while using other corpora for support. SVM-CW is simple, but it is more effective than other methods that have been successfully applied to other NLP tasks earlier. With the feature vector and SVM-CW, our system achieved the best performance among all state-of-the-art PPI extraction systems reported so far.