Toward breast cancer survivability prediction models through improving training space

Authors:
Jaree Thongkam;Guandong Xu;Yanchun Zhang;Fuchun Huang
Affiliations:
School of Computer Science and Mathematics, Victoria University, P.O. Box 14428, Melbourne, Vic. 8001, Australia;School of Computer Science and Mathematics, Victoria University, P.O. Box 14428, Melbourne, Vic. 8001, Australia;School of Computer Science and Mathematics, Victoria University, P.O. Box 14428, Melbourne, Vic. 8001, Australia;School of Computer Science and Mathematics, Victoria University, P.O. Box 14428, Melbourne, Vic. 8001, Australia
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 22
Cited 5

C4.5: programs for machine learning

C4.5: programs for machine learning
Discovering data mining: from concept to implementation

Discovering data mining: from concept to implementation
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Problems with Mining Medical Data

COMPSAC '00 24th International Computer Software and Applications Conference
Modeling medical prognosis: survival analysis techniques

Computers and Biomedical Research
Using AUC and Accuracy in Evaluating Learning Algorithms

IEEE Transactions on Knowledge and Data Engineering
Analysis of Breast Cancer Using Data Mining and Statistical Techniques

SNPD-SAWN '05 Proceedings of the Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Networks
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Improving Mining of Medical Data by Outliers Prediction

CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
Mining risk patterns in medical data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
The effect of imbalanced data sets on LDA: A theoretical and empirical analysis

Pattern Recognition
The feasibility of constructing a Predictive Outcome Model for breast cancer using the tools of data mining

Expert Systems with Applications: An International Journal
Unbalanced Data Classification Using extreme outlier Elimination and Sampling Techniques for Fraud Detection

ADCOM '07 Proceedings of the 15th International Conference on Advanced Computing and Communications
Breast cancer survivability via AdaBoost algorithms

HDKM '08 Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Predicting breast cancer survivability: a comparison of three data mining methods

Artificial Intelligence in Medicine
Ensemble methods for noise elimination in classification problems

MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Combining SVM classifiers for email anti-spam filtering

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Text categorization based on artificial neural networks

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Evaluation and NLP

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications

A survey of prediction models for breast cancer survivability

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Predictive-collaborative model as recovery and validation tool. Case of study: Psychiatric emergency department decision support

Expert Systems with Applications: An International Journal
Breast Alert: An On-line Tool for Predicting the Lifetime Risk of Women Breast Cancer

Journal of Medical Systems
Robust predictive model for evaluating breast cancer survivability

Engineering Applications of Artificial Intelligence
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Due to the difficulties of outlier and skewed data, the prediction of breast cancer survivability has presented many challenges in the field of data mining and pattern precognition, especially in medical research. To solve these problems, we have proposed a hybrid approach to generating higher quality data sets in the creation of improved breast cancer survival prediction models. This approach comprises two main steps: (1) utilization of an outlier filtering approach based on C-Support Vector Classification (C-SVC) to identify and eliminate outlier instances; and (2) application of an over-sampling approach using over-sampling with replacement to increase the number of instances in the minority class. In order to assess the capability and effectiveness of the proposed approach, several measurement methods including basic performance (e.g., accuracy, sensitivity, and specificity), Area Under the receiver operating characteristic Curve (AUC) and F-measure were utilized. Moreover, a 10-fold cross-validation method was used to reduce the bias and variance of the results of breast cancer survivability prediction models. Results have indicated that the proposed approach leads to improving the performance of breast cancer survivability prediction models by up to 28.34% due to the improved training data space.