A hybrid prediction model with F-score feature selection for type II Diabetes databases

Authors:
B. Sarojini Ilango;N. Ramaraj
Affiliations:
K.L.N. College of Information Technology, Madurai;G.K.M. College of Engineering & Technology, Chennai
Venue:
Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Year:
2010

Citing 11
Cited 0

Data preparation for data mining

Data preparation for data mining
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
A Formalism for Relevance and Its Application in Feature Subset Selection

Machine Learning
A Simple Decomposition Method for Support Vector Machines

Machine Learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine

Expert Systems with Applications: An International Journal
Prediction model building and feature selection with support vector machines in breast cancer diagnosis

Expert Systems with Applications: An International Journal
Automated Identification of Diabetic Type 2 Subjects with and without Neuropathy Using Wavelet Transform on Pedobarograph

Journal of Medical Systems
Predicting breast cancer survivability: a comparison of three data mining methods

Artificial Intelligence in Medicine
Hybrid prediction model for Type-2 diabetic patients

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The medical data are multidimensional, and are represented by a large number of features. Hundreds of independent features (parameters) in these high dimensional databases need to be simultaneously considered and analyzed, for valuable decision-making information in medical prediction. Most data mining methods depend on a set of features that define the behavior of the learning algorithm and directly or indirectly influence the complexity of the resulting models. Hence, to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed by an efficient dimensionality reduction method. The aim of this study is to improve the diagnostic accuracy of diabetes disease by selecting informative features of Pima Indians Diabetes Dataset. This study proposes a Hybrid Prediction Model with F-score feature selection approach to identify the optimal feature subset of the Pima Indians Diabetes dataset. The features of diabetes dataset are ranked using F-score and the feature subset that gives the minimal clustering error is the optimal feature subset of the dataset. The correctly classified instances determine the pattern for diagnosis and are used for further classification process. The improved performance of the Support Vector Machine classifier measured in terms of Accuracy of the classifier, Sensitivity, Specificity and Area Under Curve (AUC) proves that the proposed feature approach indeed improves the performance of classification. The proposed prediction model achieves a predictive accuracy of 98.9427 and it is the highest predictive accuracy for diabetes dataset compared to other models in literature for this problem.