Clinical charge profiles prediction for patients diagnosed with chronic diseases using Multi-level Support Vector Machine

  • Authors:
  • Wei Zhong;Rick Chow;Jieyue He

  • Affiliations:
  • Division of Mathematics and Computer Science, University of South Carolina Upstate, SC 29303, USA;Division of Mathematics and Computer Science, University of South Carolina Upstate, SC 29303, USA;School of Computer Science and Engineering, Southeast University, Nanjing 210096, China

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

This research utilizes the national Healthcare Cost & Utilization Project (HCUP-3) databases to construct Support Vector Machine (SVM) classifiers to predict clinical charge profiles, including hospital charges and length of stay (LOS), for patients diagnosed with heart and circulatory disease, diabetes and cancer, respectively. Clinical charge profiles predictions can provides relevant clinical knowledge for healthcare policy makers to effectively manage healthcare services and costs at the national, state, and local levels. Despite its solid mathematical foundation and promising experimental results, SVM is not favorable for large-scale data mining tasks since its training time complexity is at least quadratic to the number of samples. Furthermore, traditional SVM classification algorithms cannot build an effective SVM when different data distribution patterns are intermingled in a large dataset. In order to enhance SVM training for large, complex and noisy healthcare datasets, we propose the Multi-level Support Vector Machine (MLSVM) that organizes the dataset as clusters in a tree to produce better partitions for more effective SVM classification. The MLSVM model utilizes multiple SVMs, each of which learns the local data distribution patterns in a cluster efficiently. A decision fusion algorithm is used to generate an effective global decision that incorporates local SVM decisions at different levels of the tree. Consequently, MLSVM can handle complex and often conflicting data distributions in large datasets more effectively than the single-SVM based approaches and the multiple SVM systems. Both the combined 5x2-fold cross validation F test and the independent test show that classification performance of MLSVM is much superior to that of a CVM, ACSVM and CSVM based on three popular performance evaluation metrics. In this work, CSVM and MLSVM are parallelized to speed up the slow SVM training process for very large and complex datasets. Running time analysis shows that MLSVM can accelerate SVM's training process noticeably when the parallel algorithm is employed.