Applying gaussian distribution-dependent criteria to decision trees for high-dimensional microarray data

Authors:
Raymond Wan;Ichigaku Takigawa;Hiroshi Mamitsuka
Affiliations:
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan;Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan;Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan
Venue:
VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
Year:
2006

Citing 5
Cited 0

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
C4.5: programs for machine learning

C4.5: programs for machine learning
Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biological data presents unique problems for data analysis due to its high dimensions. Microarray data is one example of such data which has received much attention in recent years. Machine learning algorithms such as support vector machines (SVM) are ideal for microarray data due to its high classification accuracies. However, sometimes the information being sought is a list of genes which best separates the classes, and not a classification rate. Decision trees are one alternative which do not perform as well as SVMs, but their output is easily understood by non-specialists. A major obstacle with applying current decision tree implementations for high-dimensional data sets is their tendency to assign the same scores for multiple attributes. In this paper, we propose two distribution-dependant criteria for decision trees to improve their usefulness for microarray classification.