Data mining techniques for cancer detection using serum proteomic profiling

Authors:
Lihua Li;Hong Tang;Zuobao Wu;Jianli Gong;Michael Gruidl;Jun Zou;Melvyn Tockman;Robert A. Clark
Affiliations:
Department of Radiology, College of Medicine, H. Lee Moffitt Cancer Center and Research Institute, University of South Florida, Tampa, FL 33612-4799, USA;Department of Radiology, College of Medicine, H. Lee Moffitt Cancer Center and Research Institute, University of South Florida, Tampa, FL 33612-4799, USA;Department of Radiology, College of Medicine, H. Lee Moffitt Cancer Center and Research Institute, University of South Florida, Tampa, FL 33612-4799, USA;Department of Radiology, College of Medicine, H. Lee Moffitt Cancer Center and Research Institute, University of South Florida, Tampa, FL 33612-4799, USA;Department of Interdiciplinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, University of South Florida, Tampa, FL 33612-4799, USA;Department of Interdiciplinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, University of South Florida, Tampa, FL 33612-4799, USA;Department of Interdiciplinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, University of South Florida, Tampa, FL 33612-4799, USA;Department of Radiology, College of Medicine, H. Lee Moffitt Cancer Center and Research Institute, University of South Florida, Tampa, FL 33612-4799, USA
Venue:
Artificial Intelligence in Medicine
Year:
2004

Citing 3
Cited 16

The nature of statistical learning theory

The nature of statistical learning theory
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining

The use of receiver operating characteristic curves in biomedical informatics

Journal of Biomedical Informatics - Special issue: Clinical machine learning
A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples

Journal of Biomedical Informatics
Guilt-by-association feature selection: Identifying biomarkers from proteomic profiles

Journal of Biomedical Informatics
A Bayesian approach to support vector machines for the binary classification

Neurocomputing
Closed loop knowledge discovery for decision support in intensive care medicine

ICCOMP'09 Proceedings of the WSEAES 13th international conference on Computers
Introducing intelligence in electronic healthcare systems: state of the art and future trends

Artificial intelligence
An intensity-region driven multi-classifier scheme for improving the classification accuracy of proteomic MS-spectra

Computer Methods and Programs in Biomedicine
Global optimization of support vector machines using genetic algorithms for bankruptcy prediction

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Classification of infectious diseases based on chemiluminescent signatures of phagocytes in whole blood

Artificial Intelligence in Medicine
Beating the noise: new statistical methods for detecting signals in MALDI-TOF spectra below noise level

CompLife'06 Proceedings of the Second international conference on Computational Life Sciences
ProSpect: an R package for analyzing SELDI measurements identifying protein biomarkers

CompLife'05 Proceedings of the First international conference on Computational Life Sciences
Learning rules with complex temporal patterns in biomedical domains

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
Genetic programming for biomarker detection in mass spectrometry data

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Feature selection and classification of high dimensional mass spectrometry data: a genetic programming approach

EvoBIO'13 Proceedings of the 11th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Using decision tree for diagnosing heart disease patients

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Pathological changes in an organ or tissue may be reflected in proteomic patterns in serum. It is possible that unique serum proteomic patterns could be used to discriminate cancer samples from non-cancer ones. Due to the complexity of proteomic profiling, a higher order analysis such as data mining is needed to uncover the differences in complex proteomic patterns. The objectives of this paper are (1) to briefly review the application of data mining techniques in proteomics for cancer detection/diagnosis; (2) to explore a novel analytic method with different feature selection methods; (3) to compare the results obtained on different datasets and that reported by Petricoin et al. in terms of detection performance and selected proteomic patterns. Methods and material: Three serum SELDI MS data sets were used in this research to identify serum proteomic patterns that distinguish the serum of ovarian cancer cases from non-cancer controls. A support vector machine-based method is applied in this study, in which statistical testing and genetic algorithm-based methods are used for feature selection respectively. Leave-one-out cross validation with receiver operating characteristic (ROC) curve is used for evaluation and comparison of cancer detection performance. Results and conclusions: The results showed that (1) data mining techniques can be successfully applied to ovarian cancer detection with a reasonably high performance; (2) the classification using features selected by the genetic algorithm consistently outperformed those selected by statistical testing in terms of accuracy and robustness; (3) the discriminatory features (proteomic patterns) can be very different from one selection method to another. In other words, the pattern selection and its classification efficiency are highly classifier dependent. Therefore, when using data mining techniques, the discrimination of cancer from normal does not depend solely upon the identity and origination of cancer-related proteins.