k-Information Gain Scaled Nearest Neighbors: A Novel Approach to Classifying Protein-Protein Interaction-Related Documents

Authors:
Kyle H. Ambert;Aaron M. Cohen
Affiliations:
Oregon Health & Science University, Portland;Oregon Health & Science University, Portland
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 9
Cited 1

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An adaptive k-nearest neighbor text categorization strategy

ACM Transactions on Asian Language Information Processing (TALIP)
The MIPS mammalian protein--protein interaction database

Bioinformatics
Tackling concept drift by temporal inductive transfer

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Expert Systems with Applications: An International Journal
An Overview of BioCreative II.5

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Identification of DNA-Binding and Protein-Binding Proteins Using Enhanced Graph Wavelet Features

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although publicly accessible databases containing protein-protein interaction (PPI)-related information are important resources to bench and in silico research scientists alike, the amount of time and effort required to keep them up to date is often burdonsome. In an effort to help identify relevant PPI publications, text-mining tools, from the machine learning discipline, can be applied to help in this process. Here, we describe and evaluate two document classification algorithms that we submitted to the BioCreative II.5 PPI Classification Challenge Task. This task asked participants to design classifiers for identifying documents containing PPI-related information in the primary literature, and evaluated them against one another. One of our systems was the overall best-performing system submitted to the challenge task. It utilizes a novel approach to k-nearest neighbor classification, which we describe here, and compare its performance to those of two support vector machine-based classification systems, one of which was also evaluated in the challenge task.