Feature generation and representations for protein-protein interaction classification

Authors:
Man Lan;Chew Lim Tan;Jian Su
Affiliations:
East China Normal University, Shanghai, China and Institute for Infocomm Research, Singapore;School of Computing, National University of Singapore, Singapore;Institute for Infocomm Research, Singapore
Venue:
Journal of Biomedical Informatics
Year:
2009

Citing 14
Cited 3

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles

ACM SIGKDD Explorations Newsletter
Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1)

ACM SIGKDD Explorations Newsletter
A machine learning approach for the curation of biomedical literature: KDD Cup 2002 (task 1)

ACM SIGKDD Explorations Newsletter
Automatic scientific text classification using local patterns: KDD CUP 2002 (task 1)

ACM SIGKDD Explorations Newsletter
Supervised and Traditional Term Weighting Methods for Automatic Text Categorization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Assessing the correlation between contextual patterns and biological entity tagging

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploring deep knowledge resources in biomedical name recognition

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Proposing a new term weighting scheme for text categorization

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1

Guest Editorial: Current issues in biomedical text mining and natural language processing

Journal of Biomedical Informatics
Mining association language patterns using a distributional semantic model for negative life event classification

Journal of Biomedical Informatics
A novel method for prediction of protein interaction sites based on integrated RBF neural networks

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic detecting protein-protein interaction (PPI) relevant articles is a crucial step for large-scale biological database curation. The previous work adopted POS tagging, shallow parsing and sentence splitting techniques, but they achieved worse performance than the simple bag-of-words representation. In this paper, we generated and investigated multiple types of feature representations in order to further improve the performance of PPI text classification task. Besides the traditional domain-independent bag-of-words approach and the term weighting methods, we also explored other domain-dependent features, i.e. protein-protein interaction trigger keywords, protein named entities and the advanced ways of incorporating Natural Language Processing (NLP) output. The integration of these multiple features has been evaluated on the BioCreAtIvE II corpus. The experimental results showed that both the advanced way of using NLP output and the integration of bag-of-words and NLP output improved the performance of text classification. Specifically, in comparison with the best performance achieved in the BioCreAtIvE II IAS, the feature-level and classifier-level integration of multiple features improved the performance of classification 2.71% and 3.95%, respectively.