Advances in neural information processing systems 2
Neural networks for pattern recognition
Neural networks for pattern recognition
Second Order Derivatives for Network Pruning: Optimal Brain Surgeon
Advances in Neural Information Processing Systems 5, [NIPS Conference]
An introduction to variable and feature selection
The Journal of Machine Learning Research
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Convex optimization techniques for fitting sparse Gaussian graphical models
ICML '06 Proceedings of the 23rd international conference on Machine learning
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Learning graphical model structure using L1-regularization paths
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Factor graphs and the sum-product algorithm
IEEE Transactions on Information Theory
Hi-index | 0.00 |
One of the crucial tasks in many inference problems is the extraction of an underlying sparse graphical model from a given number of high-dimensional measurements. In machine learning, this is frequently achieved using, as a penalty term, the Lp norm of the model parameters, with p≤ 1 for efficient dilution. Here we propose a statistical-mechanics analysis of the problem in the setting of perceptron memorization and generalization. Using a replica approach, we are able to evaluate the relative performance of naive dilution (obtained by learning without dilution, following by applying a threshold to the model parameters), L1 dilution (which is frequently used in convex optimization) and L0 dilution (which is optimal but computationally hard to implement). Whereas both Lp diluted approaches clearly outperform the naive approach, we find a small region where L0 works almost perfectly and strongly outperforms the simpler to implement L1 dilution. In the second part we propose an efficient message-passing strategy in the simpler case of discrete classification vectors, where the norm L0 norm coincides with the L1. Some examples are discussed.