An integrated machine learning approach to stroke prediction

Authors:
Aditya Khosla;Yu Cao;Cliff Chiung-Yu Lin;Hsu-Kuang Chiu;Junling Hu;Honglak Lee
Affiliations:
Stanford University, Stanford, USA;Stanford University, Stanford, USA;Stanford University, Stanford, USA;Stanford University, Stanford, USA;eBay Inc, San Jose, USA;Stanford University, Stanford, USA
Venue:
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2010

Citing 7
Cited 3

Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Imputation of Missing Values in DNA Microarray Gene Expression Data

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)

Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches

ECML '07 Proceedings of the 18th European conference on Machine Learning

Toward personalized care management of patients at risk: the diabetes case study

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploring associations between changes in ambient temperature and stroke occurrence: comparative analysis using global and personalised modelling approaches

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
An integrated data mining approach to real-time clinical monitoring and deterioration warning

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stroke is the third leading cause of death and the principal cause of serious long-term disability in the United States. Accurate prediction of stroke is highly valuable for early intervention and treatment. In this study, we compare the Cox proportional hazards model with a machine learning approach for stroke prediction on the Cardiovascular Health Study (CHS) dataset. Specifically, we consider the common problems of data imputation, feature selection, and prediction in medical datasets. We propose a novel automatic feature selection algorithm that selects robust features based on our proposed heuristic: conservative mean. Combined with Support Vector Machines (SVMs), our proposed feature selection algorithm achieves a greater area under the ROC curve (AUC) as compared to the Cox proportional hazards model and L1 regularized Cox feature selection algorithm. Furthermore, we present a margin-based censored regression algorithm that combines the concept of margin-based classifiers with censored regression to achieve a better concordance index than the Cox model. Overall, our approach outperforms the current state-of-the-art in both metrics of AUC and concordance index. In addition, our work has also identified potential risk factors that have not been discovered by traditional approaches. Our method can be applied to clinical prediction of other diseases, where missing data are common and risk factors are not well understood.