Algorithms for subset selection in linear regression

Authors:
Abhimanyu Das;David Kempe
Affiliations:
University of Southern California, Los Angeles, CA, USA;University of Southern California, Los Angeles, CA, USA
Venue:
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Year:
2008

Citing 17
Cited 20

Matrix analysis

Matrix analysis
Applied multivariate statistical analysis

Applied multivariate statistical analysis
Sparse Approximate Solutions to Linear Systems

SIAM Journal on Computing
Greedy algorithms and M-term approximation with regard to redundant dictionaries

Journal of Approximation Theory
On the Optimality of the Backward Greedy Algorithm for the Subset Selection Problem

SIAM Journal on Matrix Analysis and Applications
Maximum-entropy remote sampling

Discrete Applied Mathematics
The Vector Partition Problem for Convex Objective Functions

Mathematics of Operations Research
A Polynomial Time Algorithm for Shaped Partition Problems

SIAM Journal on Optimization
Approximation of functions over redundant dictionaries using coherence

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Convex Optimization

Convex Optimization
Near-optimal sensor placements in Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Model-driven data acquisition in sensor networks

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Leveraging redundancy in sampling-interpolation applications for sensor networks

DCOSS'07 Proceedings of the 3rd IEEE international conference on Distributed computing in sensor systems
Greed is good: algorithmic results for sparse approximation

IEEE Transactions on Information Theory
Stable recovery of sparse overcomplete representations in the presence of noise

IEEE Transactions on Information Theory
Just relax: convex programming methods for identifying sparse signals in noise

IEEE Transactions on Information Theory

Sensor Selection for Minimizing Worst-Case Prediction Error

IPSN '08 Proceedings of the 7th international conference on Information processing in sensor networks
Toward Community Sensing

IPSN '08 Proceedings of the 7th international conference on Information processing in sensor networks
Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies

The Journal of Machine Learning Research
AMBROSia: An Autonomous Model-Based Reactive Observing System

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Simultaneous placement and scheduling of sensors

IPSN '09 Proceedings of the 2009 International Conference on Information Processing in Sensor Networks
Nonmyopic informative path planning in spatio-temporal models

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Weight-decay regularization in reproducing Kernel Hilbert spaces by variable-basis schemes

WSEAS Transactions on Mathematics
Regularization techniques and suboptimal solutions to optimization problems in learning from data

Neural Computation
Online distributed sensor selection

Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks
Robust sensor placements at informative and communication-efficient locations

ACM Transactions on Sensor Networks (TOSN)
A unified submodular framework for multimodal IC Trojan detection

IH'10 Proceedings of the 12th international conference on Information hiding
Efficient Sensing Topology Management for Spatial Monitoring with Sensor Networks

Journal of Signal Processing Systems
Submodularity and its applications in optimized information gathering

ACM Transactions on Intelligent Systems and Technology (TIST)
Optimal experimental design for sampling voltage on dendritic trees in the low-SNR regime

Journal of Computational Neuroscience
In-situ soil moisture sensing: Optimal sensor placement and field estimation

ACM Transactions on Sensor Networks (TOSN)
Randomized sensing in adversarial environments

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
A note on the hardness of sparse approximation

Information Processing Letters
Active planning for underwater inspection and the benefit of adaptivity

International Journal of Robotics Research
A polynomial case of the cardinality-constrained quadratic optimization problem

Journal of Global Optimization
A greedy algorithm for dimensionality reduction in polynomial regression to forecast the performance of a power plant condenser

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of selecting a subset of k random variables to observe that will yield the best linear prediction of another variable of interest, given the pairwise correlations between the observation variables and the predictor variable. Under approximation preserving reductions, this problem is equivalent to the "sparse approximation" problem of approximating signals concisely. The subset selection problem is NP-hard in general; in this paper, we propose and analyze exact and approximation algorithms for several special cases of practical interest. Specifically, we give an FPTAS when the covariance matrix has constant bandwidth, and exact algorithms when the associated covariance graph, consisting of edges for pairs of variables with non-zero correlation, forms a tree or has a large (known) independent set. Furthermore, we give an exact algorithm when the variables can be embedded into a line such that the covariance decreases exponentially in the distance, and a constant-factor approximation when the variables have no "conditional suppressor variables". Much of our reasoning is based on perturbation results for the R2 multiple correlation measure, which is frequently used as a natural measure for "goodness-of-fit statistics". It lies at the core of our FPTAS, and also allows us to extend our exact algorithms to approximation algorithms when the matrix "nearly" falls into one of the above classes. We also use our perturbation analysis to prove approximation guarantees for the widely used "Forward Regression" heuristic under the assumption that the observation variables are nearly independent.