Mining cancer genes with running-sum statistics

Authors:
Inho Park;Kwang H. Lee;Doheon Lee
Affiliations:
KAIST, Daejeon, South Korea;KAIST, Daejeon, South Korea;KAIST, Daejeon, South Korea
Venue:
Proceedings of the third international workshop on Data and text mining in bioinformatics
Year:
2009

Citing 4
Cited 0

Random Forests

Machine Learning
Missing value estimation for DNA microarray gene expression data: local least squares imputation

Bioinformatics
Clustering and Embedding Using Commute Times

IEEE Transactions on Pattern Analysis and Machine Intelligence
A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new method to detect candidate cancer genes for developing molecular biomarkers or therapeutic targets from cancer microarray datasets. To resolve problems resulted in the molecular heterogeneity of cancers on gene prioritizing, our proposed method is intended to identify genes that are over- or down- expressed not in the whole cancer samples but also in a subgroup of cancer samples. To this end, we propose the RS score for gene ranking calculated with a weighted running sum statistic on the ordered list of expression values of each gene. We apply the proposed method to publically available prostate cancer microarray datasets, showing that it can identify previously well known prostate cancer associated genes such as ERG, HPN, and AMACR at the top of the list of candidate genes. Embedding samples, represented as vectors of the expression values of the top 20 genes, into a two dimensional space using the commute time embedding shows the distinction between normal samples and cancer samples in the independent test datasets as well as in the training datasets. We further evaluate the proposed method by estimating classification performance on the independent test datasets, and it shows the better classification performance compared to the other cancer outlier profile approaches.