Active learning and basis selection for kernel-based linear models: a Bayesian perspective

Authors:
John Paisley;Xuejun Liao;Lawrence Carin
Affiliations:
Department of Electrical and Computer Engineering, Duke University, Durham, NC;Department of Electrical and Computer Engineering, Duke University, Durham, NC;Department of Electrical and Computer Engineering, Duke University, Durham, NC
Venue:
IEEE Transactions on Signal Processing
Year:
2010

Citing 17
Cited 1

Information-based objective functions for active data selection

Neural Computation
Ridge regression: biased estimation for nonorthogonal problems

Technometrics
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Kernel Matching Pursuit

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
The evidence framework applied to classification networks

Neural Computation
Compressed sensing and Bayesian experimental design

Proceedings of the 25th international conference on Machine learning
Random Projections of Smooth Manifolds

Foundations of Computational Mathematics
A majorization-minimization algorithm for (multiple) hyperparameter learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active learning with statistical models

Journal of Artificial Intelligence Research
Compressed sensing

IEEE Transactions on Information Theory
Application of the theory of optimal experiments to adaptive electromagnetic-induction sensing of buried targets

IEEE Transactions on Pattern Analysis and Machine Intelligence

A Bayesian active learning framework for a two-class classification problem

MUSCLE'11 Proceedings of the 2011 international conference on Computational Intelligence for Multimedia Understanding

Quantified Score

Hi-index	35.68

Visualization

Abstract

We develop an active learning algorithm for kernel-based linear regression and classification. The proposed greedy algorithm employs a minimum-entropy criterion derived using a Bayesian interpretation of ridge regression. We assume access to a matrix, Φ ∈ RN × N, for which the (i. j)th element is defined by the kernel function K(γi, γj) ∈ R, with the observed data γi ∈ Rd. We seek a model, M: γi → yi where yi is a real-valued response or integer-valued label, which we do not have access to a priori. To achieve this goal, a submatrix, ΦIl, Ib ∈ Rn×m, is sought that corresponds to the intersection of n rows and m columns of Φ, indexed by the sets Il and Ib, respectively. Typically m ≪ N and n ≪ N. We have two objectives: (i) Determine the m columns of Φ, indexed by the set Ib, that are the most informative for building a linear model, M: [1Φi,Ib]T → yi, without any knowledge of {yi}i=1N and (ii) using active learning, sequentially determine which subset of n elements of {yi}i=1N should be acquired; both stopping values, |Ib| = m and |Il| = n, are also to be inferred from the data. These steps are taken with the goal of minimizing the uncertainty of the model parameters, x, as measured by the differential entropy of its posterior distribution. The parameter vector x ∈ Rm, as well as the model bias η ∈ R, is then learned from the resulting problem, yIl = ΦIl,Ib x + η1 + ε. The remaining N - n responses/labels not included in yIl can be inferred by applying x to the remaining N - n rows of ΦIb. We show experimental results for several regression and classification problems, and compare to other active learning methods.