Applying gaussian distribution-dependent criteria to decision trees for high-dimensional microarray data

  • Authors:
  • Raymond Wan;Ichigaku Takigawa;Hiroshi Mamitsuka

  • Affiliations:
  • Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan;Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan;Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan

  • Venue:
  • VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Biological data presents unique problems for data analysis due to its high dimensions. Microarray data is one example of such data which has received much attention in recent years. Machine learning algorithms such as support vector machines (SVM) are ideal for microarray data due to its high classification accuracies. However, sometimes the information being sought is a list of genes which best separates the classes, and not a classification rate. Decision trees are one alternative which do not perform as well as SVMs, but their output is easily understood by non-specialists. A major obstacle with applying current decision tree implementations for high-dimensional data sets is their tendency to assign the same scores for multiple attributes. In this paper, we propose two distribution-dependant criteria for decision trees to improve their usefulness for microarray classification.