Data mining a diabetic data warehouse

  • Authors:
  • Joseph L. Breault;Colin R. Goodall;Peter J. Fos

  • Affiliations:
  • Family Medicine, Ochsner Clinic Foundation, New Orleans, LA 70121, USA and Health Systems Management, Tulane University, New Orleans, LA 70112, USA;AT&T Shannon Research and Technology Laboratory, Middletown, NJ 07748, USA and Adjunct Appointment, Biostatistics, Tulane University, New Orleans, LA 70112, USA;School of Dentistry, University of Nevada Las Vegas, Las Vegas, NV 89154, USA and Health Systems Management, Tulane University, New Orleans, LA 70112, USA

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Diabetes is a major health problem in the United States. There is a long history of diabetic registries and databases with systematically collected patient information. We examine one such diabetic data warehouse, showing a method of applying data mining techniques, and some of the data issues, analysis problems, and results. The diabetic data warehouse is from a large integrated health care system in the New Orleans area with 30,383 diabetic patients. Methods for translating a complex relational database with time series and sequencing information to a flat file suitable for data mining are challenging. We discuss two variables in detail, a comorbidity index and the HgbA1c, a measure of glycemic control related to outcomes. We used the classification tree approach in Classification and Regression Trees (CART^(R)) with a binary target variable of HgbA1c 9.5 and 10 predictors: age, sex, emergency department visits, office visits, comorbidity index, dyslipidemia, hypertension, cardiovascular disease, retinopathy, end-stage renal disease. Unexpectedly, the most important variable associated with bad glycemic control is younger age, not the comorbiditity index or whether patients have related diseases. If we want to target diabetics with bad HgbA1c values, the odds of finding them is 3.2 times as high in those