Using Data Mining Techniques in Monitoring Diabetes Care. The Simpler the Better?

Authors:
Dario Gregori;Michele Petrinco;Simona Bo;Rosalba Rosato;Eva Pagano;Paola Berchialla;Franco Merletti
Affiliations:
Laboratories of Epidemiological Methods and Biostatistics, Department of Environmental Medicine and Public Health, University of Padova, Padova, Italy 35121;Unit of Cancer Epidemiology, University of Torino, and CPO Piemonte, Turin, Italy;Department of Internal Medicine, University of Torino, Turin, Italy;Unit of Cancer Epidemiology, University of Torino, and CPO Piemonte, Turin, Italy;Unit of Cancer Epidemiology, University of Torino, and CPO Piemonte, Turin, Italy;Department of Public Health and Microbiology, University of Torino, Turin, Italy;Unit of Cancer Epidemiology, University of Torino, and CPO Piemonte, Turin, Italy
Venue:
Journal of Medical Systems
Year:
2011

Citing 6
Cited 2

Neural networks and logistic regression: Part II

Computational Statistics & Data Analysis
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Supervised dimension reduction of intrinsically low-dimensional data

Neural Computation
Knowledge discovery with classification rules in a cardiovascular dataset

Computer Methods and Programs in Biomedicine
Data mining for seeking accurate quantitative relationship between molecular structure and GC retention indices of alkanes by projection pursuit

Computational Biology and Chemistry
From projection pursuit and CART to adaptive discriminant analysis?

IEEE Transactions on Neural Networks

Review: Data mining techniques and applications - A decade review from 2000 to 2011

Expert Systems with Applications: An International Journal
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We aim at evaluating how data-mining statistical techniques can be applied on medical records and administrative data of diabetes and how they differ in terms of capabilities of predicting outcomes (e.g. death). Data on 3,892 outpatient patients with a diagnosis of type 2 diabetes from the San Giovanni Battista Hospital in Torino. Six statistical classifiers were applied: Logistic regression (LR), Generalized Additive Model (GAM), Projection pursuit Regression (PPR), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Artificial Neural Networks (ANN). All models selected the same subset of covariates. ANN is the model performing worse, whereas simpler models, like LR, GAM and LDA seem to perform better. GAM is associated with a very small misclassification rate. The agreement in predicting individual outcomes among models is 0.23 (SE 0.06, Kappa). Monitoring on the basis of patients' characteristics is highly dependent from the statistical properties of the chosen statistical model.