The feasibility of constructing a Predictive Outcome Model for breast cancer using the tools of data mining

Authors:
Thora Jonsdottir;Ebba Thora Hvannberg;Helgi Sigurdsson;Sven Sigurdsson
Affiliations:
Cancer Center for Research and Development, Landspitali-University Hospital, Kopavogsbraut 5-7, 105 Kopavogur, Iceland;University of Iceland, VR-II, Hjardarhaga 2-6, 107 Reykavik, Iceland;Cancer Center for Research and Development, Landspitali-University Hospital, Kopavogsbraut 5-7, 105 Kopavogur, Iceland;University of Iceland, VR-II, Hjardarhaga 2-6, 107 Reykavik, Iceland
Venue:
Expert Systems with Applications: An International Journal
Year:
2008

Citing 11
Cited 4

Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
Survival-Time Classification of Breast Cancer Patients

Computational Optimization and Applications
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Tree induction vs. logistic regression: a learning-curve analysis

The Journal of Machine Learning Research
The kappa statistic: a second look

Computational Linguistics
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Predicting breast cancer survivability: a comparison of three data mining methods

Artificial Intelligence in Medicine
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Toward breast cancer survivability prediction models through improving training space

Expert Systems with Applications: An International Journal
Hybrid prediction model for Type-2 diabetic patients

Expert Systems with Applications: An International Journal
A Software Tool for Determination of Breast Cancer Treatment Methods Using Data Mining Approach

Journal of Medical Systems
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.06

Visualization

Abstract

A Predictive Outcome Model (POM) for breast cancer was built, and its ability to accurately predict the (5 year) outcome of an incidence of cancer was assessed. A wide range of different feature selection and classification methods were applied in order to find the best performing algorithms on a given dataset. A special Model Selection Tool, MST, was developed to facilitate the search for the most efficient classifier model. The MST includes programs for choosing different classification algorithms, selecting subsets of features, dealing with imbalance in the data and evaluating the predictive performance by various measures. These steps are important in most data mining tasks and it would be time consuming to conduct them manually. The dataset, Rose, was assembled retroactively for this study and contains data records from 257 women diagnosed with primary breast cancer in Iceland during the years 1996-1998. An extra feature, containing the risk assessment of a doctor was added to the dataset which initially contained 400 features, both to see how much that could enhance the performance of the model and to investigate to what extent such a subjective assessment can be predicted from the remaining features. The main result is that similar performance is achieved regardless of which algorithm is used. Furthermore, the inclusion of the doctor's assessment does not appear to significantly enhance the performance. That is also reflected in the fact that the models are in general more successful in predicting the doctors risk assessment than the actual outcome if resulting Kappa values are compared.