Clinical charge profiles prediction for patients diagnosed with chronic diseases using Multi-level Support Vector Machine

Authors:
Wei Zhong;Rick Chow;Jieyue He
Affiliations:
Division of Mathematics and Computer Science, University of South Carolina Upstate, SC 29303, USA;Division of Mathematics and Computer Science, University of South Carolina Upstate, SC 29303, USA;School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 12
Cited 3

Programming with POSIX threads

Programming with POSIX threads
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
An Effective Support Vector Machines (SVMs) Performance Using Hierarchical Clustering

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Fast SVM Training Algorithm with Decomposition on Very Large Data Sets

IEEE Transactions on Pattern Analysis and Machine Intelligence
Core Vector Machines: Fast SVM Training on Very Large Data Sets

The Journal of Machine Learning Research
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms

Neural Computation
A new intrusion detection system using support vector machines and hierarchical clustering

The VLDB Journal — The International Journal on Very Large Data Bases
Enhancing prototype reduction schemes with recursion: a method applicable for "large" data sets

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
An experimental bias-variance analysis of SVM ensembles based on resampling techniques

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Data mining a diabetic data warehouse

Artificial Intelligence in Medicine

Analysis of diabetic patients through their examination history

Expert Systems with Applications: An International Journal
Applying a BP neural network model to predict the length of hospital stay

HIS'13 Proceedings of the second international conference on Health Information Science
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

This research utilizes the national Healthcare Cost & Utilization Project (HCUP-3) databases to construct Support Vector Machine (SVM) classifiers to predict clinical charge profiles, including hospital charges and length of stay (LOS), for patients diagnosed with heart and circulatory disease, diabetes and cancer, respectively. Clinical charge profiles predictions can provides relevant clinical knowledge for healthcare policy makers to effectively manage healthcare services and costs at the national, state, and local levels. Despite its solid mathematical foundation and promising experimental results, SVM is not favorable for large-scale data mining tasks since its training time complexity is at least quadratic to the number of samples. Furthermore, traditional SVM classification algorithms cannot build an effective SVM when different data distribution patterns are intermingled in a large dataset. In order to enhance SVM training for large, complex and noisy healthcare datasets, we propose the Multi-level Support Vector Machine (MLSVM) that organizes the dataset as clusters in a tree to produce better partitions for more effective SVM classification. The MLSVM model utilizes multiple SVMs, each of which learns the local data distribution patterns in a cluster efficiently. A decision fusion algorithm is used to generate an effective global decision that incorporates local SVM decisions at different levels of the tree. Consequently, MLSVM can handle complex and often conflicting data distributions in large datasets more effectively than the single-SVM based approaches and the multiple SVM systems. Both the combined 5x2-fold cross validation F test and the independent test show that classification performance of MLSVM is much superior to that of a CVM, ACSVM and CSVM based on three popular performance evaluation metrics. In this work, CSVM and MLSVM are parallelized to speed up the slow SVM training process for very large and complex datasets. Running time analysis shows that MLSVM can accelerate SVM's training process noticeably when the parallel algorithm is employed.