Feature selection for high-dimensional genomic microarray data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Imputation of Missing Values in DNA Microarray Gene Expression Data
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches
ECML '07 Proceedings of the 18th European conference on Machine Learning
Toward personalized care management of patients at risk: the diabetes case study
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
An integrated data mining approach to real-time clinical monitoring and deterioration warning
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Stroke is the third leading cause of death and the principal cause of serious long-term disability in the United States. Accurate prediction of stroke is highly valuable for early intervention and treatment. In this study, we compare the Cox proportional hazards model with a machine learning approach for stroke prediction on the Cardiovascular Health Study (CHS) dataset. Specifically, we consider the common problems of data imputation, feature selection, and prediction in medical datasets. We propose a novel automatic feature selection algorithm that selects robust features based on our proposed heuristic: conservative mean. Combined with Support Vector Machines (SVMs), our proposed feature selection algorithm achieves a greater area under the ROC curve (AUC) as compared to the Cox proportional hazards model and L1 regularized Cox feature selection algorithm. Furthermore, we present a margin-based censored regression algorithm that combines the concept of margin-based classifiers with censored regression to achieve a better concordance index than the Cox model. Overall, our approach outperforms the current state-of-the-art in both metrics of AUC and concordance index. In addition, our work has also identified potential risk factors that have not been discovered by traditional approaches. Our method can be applied to clinical prediction of other diseases, where missing data are common and risk factors are not well understood.