Data mining techniques for cancer detection using serum proteomic profiling
Artificial Intelligence in Medicine
Hi-index | 0.00 |
Protein expression profiling is a multidisciplinary research field which promises success for early cancer detection and monitoring of this widespread disease. The surface enhanced laser desorption and ionization (SELDI) is a mass spectrometry method and one of two widely used techniques for protein biomarker discovery in cancer research. There are several algorithms for signal detection in mass spectra but they are known to have poor specificity and sensitivity. Scientists have to review the analyzed mass spectra manually which is time consuming and error prone. Therefore, algorithms with improved specificity are urgently needed. We aimed to develop a peak detection method with much better specificity than the standard methods. The proposed peak algorithm is divided into three steps: (1) data import and preparation, (2) signal detection by using an Analysis of Variance (ANOVA) and the required F-statistics, and (3) classification of the computed peak cluster as significant based on the false discovery rate (FDR) specified by the user. The proposed method offers a significantly reduced preprocessing time of SELDI spectra, especially for large studies. The developed algorithms are implemented in R and available as open source packages ProSpect, rsmooth, and ProSpectGUI. The software implementation aims a high error tolerance and an easy handling for user which are unfamiliar with the statistical software R. Furthermore, the modular software design allows the simple extension and adaptation of the available code basis in the further development of the software.