Direct integration of microarrays for selecting informative genes and phenotype classification

Authors:
Youngmi Yoon;Jongchan Lee;Sanghyun Park;Sangjay Bien;Hyun Cheol Chung;Sun Young Rha
Affiliations:
Department of Computer Science, Yonsei University, 134 Sinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea and Department of Information Technology, Gachon University of Medicine and Science, S ...;Department of Computer Science, Yonsei University, 134 Sinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea;Department of Computer Science, Yonsei University, 134 Sinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea;Department of Computer Science, Yonsei University, 134 Sinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea;Department of Internal Medicine, Cancer Metastasis Research Center, Yonsei University College of Medicine, South Korea;Department of Internal Medicine, Cancer Metastasis Research Center, Yonsei University College of Medicine, South Korea
Venue:
Information Sciences: an International Journal
Year:
2008

Citing 10
Cited 3

Self-organizing maps in mining gene expression data

Information Sciences: an International Journal
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Mining phenotypes and informative genes from gene expression data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
Simple decision rules for classifying human cancers from gene expression profiles

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A novel ensemble machine learning for robust microarray data classification

Computers in Biology and Medicine
Integrating heterogeneous microarray data sources using correlation signatures

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences

A Probabilistic mechanism based on clustering analysis and distance measure for subset gene selection

Expert Systems with Applications: An International Journal
Microarray data classifier consisting of k-top-scoring rank-comparison decision rules with a variable number of genes

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a two-stage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset.