Natural gradient works efficiently in learning
Neural Computation
The Geometry of Algorithms with Orthogonality Constraints
SIAM Journal on Matrix Analysis and Applications
Diffusion Kernels on Graphs and Other Discrete Input Spaces
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Kernels for Semi-Structured Data
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
On the influence of the kernel on the consistency of support vector machines
The Journal of Machine Learning Research
Text classification using string kernels
The Journal of Machine Learning Research
Feature extraction by non parametric mutual information maximization
The Journal of Machine Learning Research
A survey of kernels for structured data
ACM SIGKDD Explorations Newsletter
Convex Optimization
Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces
The Journal of Machine Learning Research
Edgeworth Approximation of Multivariate Differential Entropy
Neural Computation
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Supervised feature selection via dependence estimation
Proceedings of the 24th international conference on Machine learning
Estimating divergence functionals and the likelihood ratio by convex risk minimization
IEEE Transactions on Information Theory
Measuring statistical dependence with hilbert-schmidt norms
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Estimation of the information by an adaptive partitioning of the observation space
IEEE Transactions on Information Theory
Neural Computation
Hi-index | 0.00 |
The goal of sufficient dimension reduction in supervised learning is to find the low-dimensional subspace of input features that contains all of the information about the output values that the input features possess. In this letter, we propose a novel sufficient dimension-reduction method using a squared-loss variant of mutual information as a dependency measure. We apply a density-ratio estimator for approximating squared-loss mutual information that is formulated as a minimum contrast estimator on parametric or nonparametric models. Since cross-validation is available for choosing an appropriate model, our method does not require any prespecified structure on the underlying distributions. We elucidate the asymptotic bias of our estimator on parametric models and the asymptotic convergence rate on nonparametric models. The convergence analysis utilizes the uniform tail-bound of a U-process, and the convergence rate is characterized by the bracketing entropy of the model. We then develop a natural gradient algorithm on the Grassmann manifold for sufficient subspace search. The analytic formula of our estimator allows us to compute the gradient efficiently. Numerical experiments show that the proposed method compares favorably with existing dimension-reduction approaches on artificial and benchmark data sets.