Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods

Authors:
Baek Hwan Cho;Hwanjo Yu;Kwang-Won Kim;Tae Hyun Kim;In Young Kim;Sun I. Kim
Affiliations:
Department of Biomedical Engineering, Hanyang University, Seoul, Republic of Korea;Department of Computer Science, University of Iowa, Iowa City, IA, USA;Division of Endocrinology and Metabolism, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea;Division of Endocrinology and Metabolism, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea;Department of Biomedical Engineering, Hanyang University, Seoul, Republic of Korea;Department of Biomedical Engineering, Hanyang University, Seoul, Republic of Korea
Venue:
Artificial Intelligence in Medicine
Year:
2008

Citing 15
Cited 13

Knowledge acquisition for temporal-abstraction mechanisms

Knowledge Acquisition - Special issue on knowledge acquisition for therapy-planning tasks
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
The nature of statistical learning theory

The nature of statistical learning theory
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Nomograms for visualizing support vector machines

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Temporal abstraction in intelligent clinical data analysis: A survey

Artificial Intelligence in Medicine
Artificial intelligence technology as a tool for initial GDM screening

Expert Systems with Applications: An International Journal
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Uniqueness of medical data mining

Artificial Intelligence in Medicine
Data mining a diabetic data warehouse

Artificial Intelligence in Medicine
A sequential neural network model for diabetes prediction

Artificial Intelligence in Medicine
Intelligent analysis of clinical time series: an application in the diabetes mellitus domain

Artificial Intelligence in Medicine
Sensitivity of feedforward neural networks to weight errors

IEEE Transactions on Neural Networks

Dimensionality reduction for knowledge discovery in medical claims database: Application to antidepressant medication utilization study

Computer Methods and Programs in Biomedicine
Different metaheuristic strategies to solve the feature selection problem

Pattern Recognition Letters
Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support

Artificial Intelligence in Medicine
Combination of feature selection approaches with SVM in credit scoring

Expert Systems with Applications: An International Journal
SMARTDIAB: a communication and information technology approach for the intelligent monitoring, management and follow-up of type 1 diabetes patients

IEEE Transactions on Information Technology in Biomedicine - Special section on new and emerging technologies in bioinformatics and bioengineering
Sequential multi-criteria feature selection algorithm based on agent genetic algorithm

Applied Intelligence
Road crash proneness prediction using data mining

Proceedings of the 14th International Conference on Extending Database Technology
Diagnosis of hypoglycemic episodes using a neural network based rule discovery system

Expert Systems with Applications: An International Journal
Assessment of cardiovascular disease risk prediction models: evaluation methods

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Natural occurrence of nocturnal hypoglycemia detection using hybrid particle swarm optimized fuzzy reasoning model

Artificial Intelligence in Medicine
Modeling Paradigms for Medical Diagnostic Decision Support: A Survey and Future Directions

Journal of Medical Systems
Hypoglycaemia detection using fuzzy inference system with multi-objective double wavelet mutation Differential Evolution

Applied Soft Computing
An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Diabetic nephropathy is damage to the kidney caused by diabetes mellitus. It is a common complication and a leading cause of death in people with diabetes. However, the decline in kidney function varies considerably between patients and the determinants of diabetic nephropathy have not been clearly identified. Therefore, it is very difficult to predict the onset of diabetic nephropathy accurately with simple statistical approaches such as t-test or @g^2-test. To accurately predict the onset of diabetic nephropathy, we applied various machine learning techniques to irregular and unbalanced diabetes dataset, such as support vector machine (SVM) classification and feature selection methods. Visualization of the risk factors was another important objective to give physicians intuitive information on each patient's clinical pattern. Methods and materials: We collected medical data from 292 patients with diabetes and performed preprocessing to extract 184 features from the irregular data. To predict the onset of diabetic nephropathy, we compared several classification methods such as logistic regression, SVM, and SVM with a cost sensitive learning method. We also applied several feature selection methods to remove redundant features and improve the classification performance. For risk factor analysis with SVM classifiers, we have developed a new visualization system which uses a nomogram approach. Results: Linear SVM classifiers combined with wrapper or embedded feature selection methods showed the best results. Among the 184 features, the classifiers selected the same 39 features and gave 0.969 of the area under the curve by receiver operating characteristics analysis. The visualization tool was able to present the effect of each feature on the decision via graphical output. Conclusions: Our proposed method can predict the onset of diabetic nephropathy about 2-3 months before the actual diagnosis with high prediction performance from an irregular and unbalanced dataset, which statistical methods such as t-test and logistic regression could not achieve. Additionally, the visualization system provides physicians with intuitive information for risk factor analysis. Therefore, physicians can benefit from the automatic early warning of each patient and visualize risk factors, which facilitate planning of effective and proper treatment strategies.