ROCR: visualizing classifier performance in R
Bioinformatics
Bioinformatics
Hi-index | 0.00 |
The development of convenient serum bioassays for cancer screening, diagnosis, prognosis, and monitoring of treatment is one of top priorities in cancer research community. Although numerous biomarker candidates have been generated by applying high-throughput technologies such as transcriptomics, proteomics, and metabolomics, few of them have been successfully validated in the clinic. Better strategies to mine omics data for successful biomarker discovery are needed. Using a data set of 22,794 tumor and normal samples across 23 tissues, we systematically analyzed current problems and challenges of serum biomarker discovery from gene expression data. We first performed tissue specificity analysis to identify genes that are both tissue-specific and up-regulated in tumors compared to controls, but identified few novel candidates. Then, we designed a novel computation method, the multiple normal tissues corrected differential analysis (MNTDA), to identify genes that are expected to be significantly up-regulated even after their expressions in other normal tissues are considered, and, in a simulation study, showed that the multiple normal tissues corrected differential analysis outperformed the single tissue differential analysis combined with tissue specificity analysis. By applying the multiple normal tissues corrected differential analysis, we identified some genes as novel biomarker candidates. However, the number of potential candidates was disappointingly small, exemplifying the difficulty of finding serum cancer biomarkers. We discussed a few important points that should be considered during biomarker discovery from omics data.