A Study on the Importance of Differential Prioritization in Feature Selection Using Toy Datasets

  • Authors:
  • Chia Huey Ooi;Shyh Wei Teng;Madhu Chetty

  • Affiliations:
  • Faculty of Information Technology, Monash University, Australia;Faculty of Information Technology, Monash University, Australia;Faculty of Information Technology, Monash University, Australia

  • Venue:
  • PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous empirical works have shown the effectiveness of differential prioritization in feature selection prior to molecular classification. We now propose to determine the theoretical basis for the concept of differential prioritization through mathematical analyses of the characteristics of predictor sets found using different values of the DDP (degree of differential prioritization) from realistic toy datasets. Mathematical analyses based on analytical measures such as distance between classes are implemented on these predictor sets. We demonstrate that the optimal value of the DDP is capable of forming a predictor set which consists of classes of features which are well separated and are highly correlated to the target classes --- a characteristic of a truly optimal predictor set. From these analyses, the necessity of adjusting the DDP based on the dataset of interest is confirmed in a mathematical manner, indicating that the DDP-based feature selection technique is superior to both simplistic rank-based selection and state-of-the-art equal-priorities scoring methods. Applying similar analyses to real-life multiclass microarray datasets, we obtain further proof of the theoretical significance of the DDP for practical applications.