Mixture classification model based on clinical markers for breast cancer prognosis

Authors:
Tao Zeng;Juan Liu
Affiliations:
School of Computer, Wuhan University, Wuhan 430079, China;School of Computer, Wuhan University, Wuhan 430079, China
Venue:
Artificial Intelligence in Medicine
Year:
2010

Citing 9
Cited 5

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
A New Version of Rough Set Exploration System

TSCTC '02 Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing
Proteomic mass spectra classification using decision tree based ensemble methods

Bioinformatics
Survival prediction of diffuse large-B-cell lymphoma based on both clinical and gene expression information

Bioinformatics
Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks

Bioinformatics
An integrated framework for risk profiling of breast cancer patients following surgery

Artificial Intelligence in Medicine
POODLE-L

Bioinformatics
The wisdom of the commons

Bioinformatics
A novel ensemble machine learning for robust microarray data classification

Computers in Biology and Medicine

Guest editorial: Artificial intelligence in biomedical engineering and informatics: An introduction and review

Artificial Intelligence in Medicine
Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection

Expert Systems with Applications: An International Journal
A quantifier-based fuzzy classification system for breast cancer patients

Artificial Intelligence in Medicine
Identifying informative genes for prediction of breast cancer subtypes

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
A novel ensemble of classifiers that use biological relevant gene sets for microarray classification

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Accurate cancer prognosis prediction is critical to cancer treatment. There have been many prognosis models based on clinical markers, but few of them are satisfied in clinical applications. And with the development of microarray technologies, cancer researchers have discovered many genes as new markers from the gene expression data and have further developed powerful prognosis models based on these so-called genetic biomarkers. However, the application of such biomarkers still suffers from some problems. The first one is there are a great number of genes and a few samples in the gene expression data so that it is difficult to select a unified gene set to establish a stable classifier for prognosis. The second one is that, due to the experimental and technical reasons, there are existing noises and redundancies in gene expression data, which may lead to building a prognosis predictor with poor performance. The last but not the least one is the microarray experiments are so expensive currently that it is hard to obtain abundant samples. Therefore, it is practical to develop prognosis methods mainly based on conventional clinical markers in real cancer treatment applications. This paper aims to establish an accurate classification model for cancer prognosis, in order to make full use of the invaluable information in clinical data, especially which is usually ignored by most of the existing methods when they aim for high prediction accuracies. Methods: First, this paper gives the formal description of general classification problem, and presents a novel mixture classification model to make full use of the invaluable information in clinical data, which is similar to the traditional ensemble classification models except for putting strict constraints on the construction of mapping functions to avoid voting process. Then, a two-layer instance of the proposed model, named as MRS (Mixture of Rough set and Support vector machine), is constructed by integrating rough set and support vector machine (SVM) classification methods, in which, the rough set classifier acts as the first layer to identify some singular samples in data, and the SVM classifier acts as the second layer to classify the remaining samples. Finally, MRS is used to make prognosis prediction on two open breast cancer datasets. One dataset, denoted as BRC-1 hereafter, is a high quality, publicly available dataset of 97 breast cancer tumors of node-negative patients. The other, denoted as BRC-2 hereafter, uses baseline human primary breast tumor data from LBL breast cancer cell collection containing 174 samples. Results: We have done two experiments on BRC-1 and BRC-2, respectively. In the first experiment, the BRC-1 dataset is divided into train set with 78 patients (34 ones belonging to poor prognosis group and 44 ones belonging to good prognosis group) and test set with 19 patients (12 ones belonging to poor prognosis group and 7 ones belonging to good prognosis). After trained on the train set, the MRS can correctly classify all the 12 patients with poor prognosis, and 6 of 7 patients with good prognosis in the test set. The results are better than previous researches, even better than the 70-gene based biomarkers. And in the second experiment, we construct the classifiers using BRC-2 dataset, and compare MRS with other representative methods in Weka software by 5-fold cross-validation, and comparison results show that MRS has higher prediction accuracy than those methods. Conclusions: The proposed mixture classification model can easily integrate methods with different characteristics. It can overcome the shortcomings of traditional voting-based ensemble models and thus can make full use of the information in clinical data. The experimental results illustrate that our implemented MRS classifier can predict the breast cancer prognosis more accurately than previous prognostic methods.