Tackling class imbalance and data scarcity in literature-based gene function annotation

Authors:
Mathieu Blondel;Kazuhiro Seki;Kuniaki Uehara
Affiliations:
Kobe University, Kobe, Japan;Kobe University, Kobe, Japan;Kobe University, Kobe, Japan
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 2
Cited 0

Support Vector Machines: Training and Applications

Support Vector Machines: Training and Applications
Gene Functional Annotation with Dynamic Hierarchical Classification Guided by Orthologs

DS '09 Proceedings of the 12th International Conference on Discovery Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, a number of machine learning approaches to literature-based gene function annotation have been proposed. However, due to issues such as lack of labeled data, class imbalance and computational cost, they have usually been unable to surpass simpler approaches based on string-matching. In this paper, we propose a principled machine learning approach based on kernel classifiers. We show that kernels can address the task's inherent data scarcity by embedding additional knowledge and we propose a simple yet effective solution to deal with class imbalance. From experiments on the TREC Genomics Track data, our approach achieves better F1-score than two state-of-the-art approaches based on string-matching and cross-species information.