Sparse/robust estimation and Kalman smoothing with nonsmooth log-concave densities: modeling, computation, and theory

Authors:
Aleksandr Y. Aravkin;James V. Burke;Gianluigi Pillonetto
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;Department of Mathematics, University of Washington, Seattle, WA;Department of Information Engineering, University of Padova, Padova, Italy
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 29
Cited 0

Bayesian interpolation

Neural Computation
On quadratic and OnL convergence of a predictor-corrector algorithm for LCP

Mathematical Programming: Series A and B
Primal-dual interior-point methods

Primal-dual interior-point methods
Properties of support vector machines

Neural Computation
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A unifying review of linear Gaussian models

Neural Computation
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Interior-Point Methods for Massive Support Vector Machines

SIAM Journal on Optimization
A Unified Loss Function in Bayesian Framework for Support Vector Regression

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Generalized Representer Theorem

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Bayesian kernel methods

Advanced lectures on machine learning
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Efficient svm training using low-rank kernel representations

The Journal of Machine Learning Research
New Support Vector Algorithms

Neural Computation
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Bayes and empirical Bayes semi-blind deconvolution using eigenfunctions of a prior covariance

Automatica (Journal of IFAC)
A convergent decomposition algorithm for support vector machines

Computational Optimization and Applications
On the Representer Theorem and Equivalent Degrees of Freedom of SVR

The Journal of Machine Learning Research
Robust l1 principal component analysis and its bayesian variational inference

Neural Computation
A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training

Computational Optimization and Applications
Clustered Nyström method for large scale manifold learning and dimension reduction

IEEE Transactions on Neural Networks
Smoothed state estimates under abrupt changes using sum-of-norms regularization

Automatica (Journal of IFAC)
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Foundations and Trends® in Machine Learning
The marginal likelihood for parameters in a discrete Gauss-Markovprocess

IEEE Transactions on Signal Processing
Compressed sensing

IEEE Transactions on Information Theory
On the convergence of the decomposition method for support vector machines

IEEE Transactions on Neural Networks
Latent Variable Bayesian Models for Promoting Sparsity

IEEE Transactions on Information Theory
Analysis of Fixed-Point and Coordinate Descent Algorithms for Regularized Kernel Methods

IEEE Transactions on Neural Networks
Doubly Robust Smoothing of Dynamical Processes via Outlier Sparsity Constraints

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a new class of quadratic support (QS) functions, many of which already play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and inverse problems such as Kalman smoothing. Well known examples of QS penalties include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions, using it to characterize conditions necessary to interpret these functions as negative logs of true probability densities. This interpretation establishes the foundation for statistical modeling with both known and new QS loss functions, and enables construction of non-smooth multivariate distributions with specified means and variances from simple scalar building blocks. The main contribution of this paper is a flexible statistical modeling framework for a variety of learning applications, together with a toolbox of efficient numerical methods for estimation. In particular, a broad subclass of QS loss functions known as piecewise linear quadratic (PLQ) penalties has a dual representation that can be exploited to design interior point (IP) methods. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. We provide several numerical examples, along with a code that can be used to solve general PLQ problems. The efficiency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing. This class comprises a wide variety of applications, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaussian errors are assumed both in the process and measurement models for such problems. We show that the extended framework allows arbitrary PLQ densities to be used, and that the proposed IP approach solves the generalized Kalman smoothing problem while maintaining the linear complexity in the size of the time series, just as in the Gaussian case. This extends the computational efficiency of the Mayne-Fraser and Rauch-Tung-Striebel algorithms to a much broader nonsmooth setting, and includes many recently proposed robust and sparse smoothers as special cases.