Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Convex Optimization
A statistical framework for genomic data fusion
Bioinformatics
A Fast Linkage Detection Scheme for Multi-Source Information Integration
WIRI '05 Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration
Web page classification with heterogeneous data fusion
Proceedings of the 16th international conference on World Wide Web
Two-view feature generation model for semi-supervised learning
Proceedings of the 24th international conference on Machine learning
Heterogeneous data fusion for alzheimer's disease study
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from Multiple Sources
The Journal of Machine Learning Research
Convex multi-task feature learning
Machine Learning
An efficient projection for l1, ∞ regularization
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
SIAM Journal on Imaging Sciences
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
SIAM Journal on Imaging Sciences
Recovering sparse signals with a certain family of nonconvex penalties and DC programming
IEEE Transactions on Signal Processing
Multi-task feature learning via efficient l2, 1-norm minimization
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Multivariate multi-way analysis of multi-source data
Bioinformatics
Spectral Regularization Algorithms for Learning Large Incomplete Matrices
The Journal of Machine Learning Research
Optimization with Sparsity-Inducing Penalties
Foundations and Trends® in Machine Learning
Hi-index | 0.00 |
With the advances and increasing sophistication in data collection techniques, we are facing with large amounts of data collected from multiple heterogeneous sources in many applications. For example, in the study of Alzheimer's Disease (AD), different types of measurements such as neuroimages, gene/protein expression data, genetic data etc. are often collected and analyzed together for improved predictive power. It is believed that a joint learning of multiple data sources is beneficial as different data sources may contain complementary information, and feature-pruning and data source selection are critical for learning interpretable models from high-dimensional data. Very often the collected data comes with block-wise missing entries; for example, a patient without the MRI scan will have no information in the MRI data block, making his/her overall record incomplete. There has been a growing interest in the data mining community on expanding traditional techniques for single-source complete data analysis to the study of multi-source incomplete data. The key challenge is how to effectively integrate information from multiple heterogeneous sources in the presence of block-wise missing data. In this paper we first investigate the situation of complete data and present a unified ``bi-level" learning model for multi-source data. Then we give a natural extension of this model to the more challenging case with incomplete data. Our major contributions are threefold: (1) the proposed models handle both feature-level and source-level analysis in a unified formulation and include several existing feature learning approaches as special cases; (2) the model for incomplete data avoids direct imputation of the missing elements and thus provides superior performances. Moreover, it can be easily generalized to other applications with block-wise missing data sources; (3) efficient optimization algorithms are presented for both the complete and incomplete models. We have performed comprehensive evaluations of the proposed models on the application of AD diagnosis. Our proposed models compare favorably against existing approaches.