Dimensionality reduction for knowledge discovery in medical claims database: Application to antidepressant medication utilization study

  • Authors:
  • Samuel. H. Huang;Lawson R. Wulsin;Hua Li;Jeff Guo

  • Affiliations:
  • Department of Mechanical Engineering, University of Cincinnati, Cincinnati, OH 45221, United States;Department of Psychiatry, University of Cincinnati, Cincinnati, OH 45221, United States;Department of Mechanical Engineering, University of Cincinnati, Cincinnati, OH 45221, United States;College of Pharmacy, University of Cincinnati, Cincinnati, OH 45221, United States

  • Venue:
  • Computer Methods and Programs in Biomedicine
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining, through its capacity to discover knowledge embedded in large databases to improve organizational decision-making, has the potential to contribute to efficiencies and cost savings in the increasingly costly healthcare industry. One important aspect of the methods of mining medical databases includes reducing dimensionality through feature selection. Traditionally feature selection is accomplished through stepwise regression, which tends to produce an unnecessarily high number of ''significant'' variables. This paper applies a filter-based feature selection method using inconsistency rate measure and discretization, to a medical claims database to predict the adequacy of duration of antidepressant medication utilization. Compared to traditional stepwise logistic regression, which selected seven variables from a total of nine potential explanatory variables to characterize patients with inadequate antidepressant medication utilization, the filter-based method selected two variables (age and number of claims) to achieve a similar prediction accuracy. This comparison suggests it may be feasible and efficient to apply the filter-based feature selection method to reduce the dimensionality of healthcare databases.