CODA: high dimensional copula discriminant analysis

Authors:
Fang Han;Tuo Zhao;Han Liu
Affiliations:
Department of Biostatistics, Johns Hopkins University, Baltimore, MD;Department of Computer Science, Johns Hopkins University, Baltimore, MD;Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 8
Cited 0

Correlations and Copulas for Decision and Risk Analysis

Management Science
On Model Selection Consistency of Lasso

The Journal of Machine Learning Research
Improved centroids estimation for the nearest shrunken centroid classifier

Bioinformatics
Hybrid huberized support vector machines for microarray classification and gene selection

Bioinformatics
Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso)

IEEE Transactions on Information Theory
The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

The Journal of Machine Learning Research
High Dimensional Inverse Covariance Matrix Estimation via Linear Programming

The Journal of Machine Learning Research
The huge package for high-dimensional undirected graph estimation in R

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a high dimensional classification method, named the Copula Discriminant Analysis (CODA). The CODA generalizes the normal-based linear discriminant analysis to the larger Gaussian Copula models (or the nonparanormal) as proposed by Liu et al. (2009). To simultaneously achieve estimation efficiency and robustness, the nonparametric rank-based methods including the Spearman's rho and Kendall's tau are exploited in estimating the covariance matrix. In high dimensional settings, we prove that the sparsity pattern of the discriminant features can be consistently recovered with the parametric rate, and the expected misclassification error is consistent to the Bayes risk. Our theory is backed up by careful numerical experiments, which show that the extra flexibility gained by the CODA method incurs little efficiency loss even when the data are truly Gaussian. These results suggest that the CODA method can be an alternative choice besides the normal-based high dimensional linear discriminant analysis.