Dataset Complexity and Gene Expression Based Cancer Classification

  • Authors:
  • Oleg Okun;Helen Priisalu

  • Affiliations:
  • University of Oulu, Oulu 90014, Finland;Tallinn University of Technology, Tallinn 19086, Estonia

  • Venue:
  • WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

When applied to supervised classification problems, dataset complexity determines how difficult a given dataset to classify. Since complexity is a nontrivial issue, it is typically defined by a number of measures. In this paper, we explore complexity of three gene expression datasets used for two-class cancer classification. We demonstrate that estimating the dataset complexity before performing actual classification may provide a hint whether to apply a single best nearest neighbour classifier or an ensemble of nearest neighbour classifiers.