FeaFiner: biomarker identification from medical data through feature generalization and selection

Authors:
Jiayu Zhou;Zhaosong Lu;Jimeng Sun;Lei Yuan;Fei Wang;Jieping Ye
Affiliations:
Arizona State University, Tempe, AZ, USA;Simon Fraser University, Bumaby, BC, Canada;IBM T.J. Watson Research Center, Yorktown Heights, NY, USA;Arizona State University, Tempe, AZ, USA;IBM T.J. Watson Research Center, Yorktown Heights, NY, USA;Arizona State University, Tempe, AZ, USA
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 11
Cited 0

Sparse graphical models for exploring gene expression data

Journal of Multivariate Analysis
On Model Selection Consistency of Lasso

The Journal of Machine Learning Research
A coordinate gradient descent method for nonsmooth separable minimization

Mathematical Programming: Series A and B
Group lasso with overlap and graph lasso

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient Euclidean projections in linear time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Large-scale sparse logistic regression

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Sparse reconstruction by separable approximation

IEEE Transactions on Signal Processing
Multi-task feature learning via efficient l2, 1-norm minimization

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
A multi-task learning formulation for predicting disease progression

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistics for High-Dimensional Data: Methods, Theory and Applications

Statistics for High-Dimensional Data: Methods, Theory and Applications
Modeling disease progression via fused sparse group lasso

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, feature construction and feature selection are two important but separate processes in data mining. However, many real world applications require an integrated approach for creating, refining and selecting features. To address this problem, we propose FeaFiner (short for Feature Refiner), an efficient formulation that simultaneously generalizes low-level features into higher level concepts and then selects relevant concepts based on the target variable. Specifically, we formulate a double sparsity optimization problem that identifies groups in the low-level features, generalizes higher level features using the groups and performs feature selection. Since in many clinical researches non- overlapping groups are preferred for better interpretability, we further improve the formulation to generalize features using mutually exclusive feature groups. The proposed formulation is challenging to solve due to the orthogonality constraints, non-convexity objective and non-smoothness penal- ties. We apply a recently developed augmented Lagrangian method to solve this formulation in which each subproblem is solved by a non-monotone spectral projected gradient method. Our numerical experiments show that this approach is computationally efficient and also capable of producing solutions of high quality. We also present a generalization bound showing the consistency and the asymptotic behavior of the learning process of our proposed formulation. Finally, the proposed FeaFiner method is validated on Alzheimer's Disease Neuroimaging Initiative dataset, where low-level biomarkers are automatically generalized into robust higher level concepts which are then selected for predicting the disease status measured by Mini Mental State Examination and Alzheimer's Disease Assessment Scale cognitive subscore. Compared to existing predictive modeling methods, FeaFiner provides intuitive and robust feature concepts and competitive predictive accuracy.