Classification of heterogeneous gene expression data

Authors:
Benny Y. M. Fung;Vincent T. Y. Ng
Affiliations:
The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong;The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Venue:
ACM SIGKDD Explorations Newsletter
Year:
2003

Citing 5
Cited 0

Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Class prediction and discovery using gene expression data

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Cancer classification using gene expression data

Information Systems - Special issue: Data management in bioinformatics
Machine learning in DNA microarray analysis for cancer classification

APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19
An empirical comparison of supervised machine learning techniques in bioinformatics

APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent advanced technologies in DNA microarray analysis are intensively applied in disease classification, especially for cancer classification. Most recent proposed gene expression classifiers can successfully classify testing samples obtained from the same microarray experiment as training samples with the assumption that the symmetric errors are constant among training and testing samples. However, the classification performance is degraded with heterogeneous testing samples obtained from different microarray experiments. In this paper, we propose the "impact factors" (IFs) to measure the variations between individual classes in training samples and heterogeneous testing samples, and integrate the IFs to classifiers for classification of heterogeneous samples. Two publicly available lung adenocarcinomas gene expression data sets are used in our experiments to demonstrate the effectiveness of the IFs. It shows that, with the integration of the IFs to the Golub and Slonim (GS) and k-nearest neighbors (kNN) classifiers, the classifiers can be further improved on the classification accuracy of heterogeneous samples. Even more, the classification accuracy of the integrated GS classifier is around 90%.