Support Vector Machine for Outlier Detection in Breast Cancer Survivability Prediction

Authors:
Jaree Thongkam;Guandong Xu;Yanchun Zhang;Fuchun Huang
Affiliations:
School of Computer Science and Mathematics, Victoria University, Melbourne, Australia VIC 8001;School of Computer Science and Mathematics, Victoria University, Melbourne, Australia VIC 8001;School of Computer Science and Mathematics, Victoria University, Melbourne, Australia VIC 8001;School of Computer Science and Mathematics, Victoria University, Melbourne, Australia VIC 8001
Venue:
Advanced Web and NetworkTechnologies, and Applications
Year:
2008

Citing 14
Cited 1

A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
Problems with Mining Medical Data

COMPSAC '00 24th International Computer Software and Applications Conference
Improving Classification by Removing or Relabeling Mislabeled Instances

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Identifying and Handling Mislabelled Instances

Journal of Intelligent Information Systems
Applying Noise Handling Techniques to Genomic Data: A Case Study

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Using AUC and Accuracy in Evaluating Learning Algorithms

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Mining risk patterns in medical data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Identifying and Correcting Mislabeled Training Instances

FGCN '07 Proceedings of the Future Generation Communication and Networking - Volume 01
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Ensemble methods for noise elimination in classification problems

MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Combining SVM classifiers for email anti-spam filtering

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Breast Alert: An On-line Tool for Predicting the Lifetime Risk of Women Breast Cancer

Journal of Medical Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding and removing misclassified instances are important steps in data mining and machine learning that affect the performance of the data mining algorithm in general. In this paper, we propose a C-Support Vector Classification Filter (C-SVCF) to identify and remove the misclassified instances (outliers) in breast cancer survivability samples collected from Srinagarind hospital in Thailand, to improve the accuracy of the prediction models. Only instances that are correctly classified by the filter are passed to the learning algorithm. Performance of the proposed technique is measured with accuracy and area under the receiver operating characteristic curve (AUC), as well as compared with several popular ensemble filter approaches including AdaBoost, Bagging and ensemble of SVM with AdaBoost and Bagging filters. Our empirical results indicate that C-SVCF is an effective method for identifying misclassified outliers. This approach significantly benefits ongoing research of developing accurate and robust prediction models for breast cancer survivability.