Using Data Mining Techniques in Monitoring Diabetes Care. The Simpler the Better?

  • Authors:
  • Dario Gregori;Michele Petrinco;Simona Bo;Rosalba Rosato;Eva Pagano;Paola Berchialla;Franco Merletti

  • Affiliations:
  • Laboratories of Epidemiological Methods and Biostatistics, Department of Environmental Medicine and Public Health, University of Padova, Padova, Italy 35121;Unit of Cancer Epidemiology, University of Torino, and CPO Piemonte, Turin, Italy;Department of Internal Medicine, University of Torino, Turin, Italy;Unit of Cancer Epidemiology, University of Torino, and CPO Piemonte, Turin, Italy;Unit of Cancer Epidemiology, University of Torino, and CPO Piemonte, Turin, Italy;Department of Public Health and Microbiology, University of Torino, Turin, Italy;Unit of Cancer Epidemiology, University of Torino, and CPO Piemonte, Turin, Italy

  • Venue:
  • Journal of Medical Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We aim at evaluating how data-mining statistical techniques can be applied on medical records and administrative data of diabetes and how they differ in terms of capabilities of predicting outcomes (e.g. death). Data on 3,892 outpatient patients with a diagnosis of type 2 diabetes from the San Giovanni Battista Hospital in Torino. Six statistical classifiers were applied: Logistic regression (LR), Generalized Additive Model (GAM), Projection pursuit Regression (PPR), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Artificial Neural Networks (ANN). All models selected the same subset of covariates. ANN is the model performing worse, whereas simpler models, like LR, GAM and LDA seem to perform better. GAM is associated with a very small misclassification rate. The agreement in predicting individual outcomes among models is 0.23 (SE 0.06, Kappa). Monitoring on the basis of patients' characteristics is highly dependent from the statistical properties of the chosen statistical model.