Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
C4.5: programs for machine learning
C4.5: programs for machine learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Improved use of continuous attributes in C4.5
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Biological data presents unique problems for data analysis due to its high dimensions. Microarray data is one example of such data which has received much attention in recent years. Machine learning algorithms such as support vector machines (SVM) are ideal for microarray data due to its high classification accuracies. However, sometimes the information being sought is a list of genes which best separates the classes, and not a classification rate. Decision trees are one alternative which do not perform as well as SVMs, but their output is easily understood by non-specialists. A major obstacle with applying current decision tree implementations for high-dimensional data sets is their tendency to assign the same scores for multiple attributes. In this paper, we propose two distribution-dependant criteria for decision trees to improve their usefulness for microarray classification.