IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Discriminant Analysis Methods for Microarray Data Classification
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Computers in Biology and Medicine
Robust Classification Method of Tumor Subtype by Using Correlation Filters
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Dimensionality reduction: beyond the Johnson-Lindenstrauss bound
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Isometric sliced inverse regression for nonlinear manifold learning
Statistics and Computing
Hi-index | 3.84 |
Motivation: Genome-wide gene expression measurements, as currently determined by the microarray technology, can be represented mathematically as points in a high-dimensional gene expression space. Genes interact with each other in regulatory networks, restricting the cellular gene expression profiles to a certain manifold, or surface, in gene expression space. To obtain knowledge about this manifold, various dimensionality reduction methods and distance metrics are used. For data points distributed on curved manifolds, a sensible distance measure would be the geodesic distance along the manifold. In this work, we examine whether an approximate geodesic distance measure captures biological similarities better than the traditionally used Euclidean distance. Results: We computed approximate geodesic distances, determined by the Isomap algorithm, for one set of lymphoma and one set of lung cancer microarray samples. Compared with the ordinary Euclidean distance metric, this distance measure produced more instructive, biologically relevant, visualizations when applying multidimensional scaling. This suggests the Isomap algorithm as a promising tool for the interpretation of microarray data. Furthermore, the results demonstrate the benefit and importance of taking nonlinearities in gene expression data into account.