Serum cancer biomarker discovery through analysis of gene expression data sets across multiple tumor and normal tissues

Authors:
Hoon Jin;Han-Chul Lee;Sung Sup Park;Yong-Su Jeong;Seon-Young Kim
Affiliations:
Medical Genomics Research Center, KRIBB, Daejeon 305-806, Republic of Korea;Medical Genomics Research Center, KRIBB, Daejeon 305-806, Republic of Korea;Aging Research Center, KRIBB, Daejeon 305-806, Republic of Korea;Department of Genetic Engineering, College of Life Science and Graduate School of Biotechnology, Kyung Hee University, Yongin-si, Gyeonggi-do 446-701, Republic of Korea;Medical Genomics Research Center, KRIBB, Daejeon 305-806, Republic of Korea and Korean Bioinformation Center, KRIBB, Daejeon 305-806, Republic of Korea and Department of Functional Genomics, Unive ...
Venue:
Journal of Biomedical Informatics
Year:
2011

Citing 2
Cited 0

ROCR: visualizing classifier performance in R

Bioinformatics
Libaffy

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The development of convenient serum bioassays for cancer screening, diagnosis, prognosis, and monitoring of treatment is one of top priorities in cancer research community. Although numerous biomarker candidates have been generated by applying high-throughput technologies such as transcriptomics, proteomics, and metabolomics, few of them have been successfully validated in the clinic. Better strategies to mine omics data for successful biomarker discovery are needed. Using a data set of 22,794 tumor and normal samples across 23 tissues, we systematically analyzed current problems and challenges of serum biomarker discovery from gene expression data. We first performed tissue specificity analysis to identify genes that are both tissue-specific and up-regulated in tumors compared to controls, but identified few novel candidates. Then, we designed a novel computation method, the multiple normal tissues corrected differential analysis (MNTDA), to identify genes that are expected to be significantly up-regulated even after their expressions in other normal tissues are considered, and, in a simulation study, showed that the multiple normal tissues corrected differential analysis outperformed the single tissue differential analysis combined with tissue specificity analysis. By applying the multiple normal tissues corrected differential analysis, we identified some genes as novel biomarker candidates. However, the number of potential candidates was disappointingly small, exemplifying the difficulty of finding serum cancer biomarkers. We discussed a few important points that should be considered during biomarker discovery from omics data.