Problems with Mining Medical Data
COMPSAC '00 24th International Computer Software and Applications Conference
Improving Classification by Removing or Relabeling Mislabeled Instances
ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Identifying and Handling Mislabelled Instances
Journal of Intelligent Information Systems
Applying Noise Handling Techniques to Genomic Data: A Case Study
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Using AUC and Accuracy in Evaluating Learning Algorithms
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Mining risk patterns in medical data
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Identifying and Correcting Mislabeled Training Instances
FGCN '07 Proceedings of the Future Generation Communication and Networking - Volume 01
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Ensemble methods for noise elimination in classification problems
MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Combining SVM classifiers for email anti-spam filtering
IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Breast Alert: An On-line Tool for Predicting the Lifetime Risk of Women Breast Cancer
Journal of Medical Systems
Hi-index | 0.00 |
Finding and removing misclassified instances are important steps in data mining and machine learning that affect the performance of the data mining algorithm in general. In this paper, we propose a C-Support Vector Classification Filter (C-SVCF) to identify and remove the misclassified instances (outliers) in breast cancer survivability samples collected from Srinagarind hospital in Thailand, to improve the accuracy of the prediction models. Only instances that are correctly classified by the filter are passed to the learning algorithm. Performance of the proposed technique is measured with accuracy and area under the receiver operating characteristic curve (AUC), as well as compared with several popular ensemble filter approaches including AdaBoost, Bagging and ensemble of SVM with AdaBoost and Bagging filters. Our empirical results indicate that C-SVCF is an effective method for identifying misclassified outliers. This approach significantly benefits ongoing research of developing accurate and robust prediction models for breast cancer survivability.