Combining feature selection and feature construction to improve concept learning for high dimensional data

  • Authors:
  • Blaise Hanczar

  • Affiliations:
  • Lim&Bio, University Paris 13, Bobigny, France

  • Venue:
  • SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes and experimentally analyses a new dimension reduction method for microarray data. Microarrays, which allow simultaneous measurement of the level of expression of thousands of genes in a given situation (tissue, cell or time), produce data which poses particular machine-learning problems. The disproportion between the number of attributes (tens of thousands) and the number of examples (hundreds) requires a reduction in dimension. While gene/class mutual information is often used to filter the genes we propose an approach which takes into account gene-pair/class information. A gene selection heuristic based on this principle is proposed as well as an automatic feature-construction procedure forcing the learning algorithms to make use of these gene pairs. We report significant improvements in accuracy on several public microarray databases.