Identifying rare classes with sparse training data

  • Authors:
  • Mingwu Zhang;Wei Jiang;Chris Clifton;Sunil Prabhakar

  • Affiliations:
  • Department of Computer Science, Purdue University, West Lafayette, IN;Department of Computer Science, Purdue University, West Lafayette, IN;Department of Computer Science, Purdue University, West Lafayette, IN;Department of Computer Science, Purdue University, West Lafayette, IN

  • Venue:
  • DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Building models and learning patterns from a collection of data are essential tasks for decision making and dissemination of knowledge. One of the common tools to extract knowledge is to build a classifier. However, when the training dataset is sparse, it is difficult to build an accurate classifier. This is especially true in biological science, as biological data are hard to produce and error-prone. Through empirical results, this paper shows challenges in building an accurate classifier with a sparse biological training dataset. Our findings indicate the inadequacies in well known classification techniques. Although certain clustering techniques, such as seeded k-Means, show some promise, there are still spaces for further improvement. In addition, we propose a novel idea that could be used to produce more balanced classifier when training data samples are very limited.