Classification of microarrays with kNN: comparison of dimensionality reduction methods

Authors:
Sampath Deegalla;Henrik Boström
Affiliations:
Dept. of Computer and Systems Sciences, Stockholm University and Royal Institute of Technology, Kista, Sweden;School of Humanities and Informatics, University of Skövde, Skövde, Sweden
Venue:
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Year:
2007

Citing 6
Cited 2

Instance-Based Learning Algorithms

Machine Learning
Database-friendly random projections

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Experiments with random projections for machine learning

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Reducing High-Dimensional Data by Principal Component Analysis vs. Random Projection for Nearest Neighbor Classification

ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Improving the Computational Efficiency of Recursive Cluster Elimination for Gene Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Methods of forward feature selection based on the aggregation of classifiers generated by single attribute

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dimensionality reduction can often improve the performance of the k-nearest neighbor classifier (kNN) for high-dimensional data sets, such as microarrays. The effect of the choice of dimensionality reduction method on the predictive performance of kNN for classifying microarray data is an open issue, and four common dimensionality reduction methods, Principal Component Analysis (PCA), Random Projection (RP), Partial Least Squares (PLS) and Information Gain(IG), are compared on eight microarray data sets. It is observed that all dimensionality reduction methods result in more accurate classifiers than what is obtained from using the raw attributes. Furthermore, it is observed that both PCA and PLS reach their best accuracies with fewer components than the other two methods, and that RP needs far more components than the others to outperform kNN on the non-reduced dataset. None of the dimensionality reduction methods can be concluded to generally outperform the others, although PLS is shown to be superior on all four binary classification tasks, but the main conclusion from the study is that the choice of dimensionality reduction method can be of major importance when classifying microarrays using kNN.