Data preparation for data mining
Data preparation for data mining
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
A Simple Decomposition Method for Support Vector Machines
Machine Learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Predicting breast cancer survivability: a comparison of three data mining methods
Artificial Intelligence in Medicine
Hybrid prediction model for Type-2 diabetic patients
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
The medical data are multidimensional, and are represented by a large number of features. Hundreds of independent features (parameters) in these high dimensional databases need to be simultaneously considered and analyzed, for valuable decision-making information in medical prediction. Most data mining methods depend on a set of features that define the behavior of the learning algorithm and directly or indirectly influence the complexity of the resulting models. Hence, to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed by an efficient dimensionality reduction method. The aim of this study is to improve the diagnostic accuracy of diabetes disease by selecting informative features of Pima Indians Diabetes Dataset. This study proposes a Hybrid Prediction Model with F-score feature selection approach to identify the optimal feature subset of the Pima Indians Diabetes dataset. The features of diabetes dataset are ranked using F-score and the feature subset that gives the minimal clustering error is the optimal feature subset of the dataset. The correctly classified instances determine the pattern for diagnosis and are used for further classification process. The improved performance of the Support Vector Machine classifier measured in terms of Accuracy of the classifier, Sensitivity, Specificity and Area Under Curve (AUC) proves that the proposed feature approach indeed improves the performance of classification. The proposed prediction model achieves a predictive accuracy of 98.9427 and it is the highest predictive accuracy for diabetes dataset compared to other models in literature for this problem.