The p-norm generalization of the LMS algorithm for adaptive filtering

Authors:
J. Kivinen;M.K. Warmuth;B. Hassibi
Affiliations:
Dept. of Comput. Sci., Univ. of Helsinki, Finland;-;-
Venue:
IEEE Transactions on Signal Processing
Year:
2006

Citing 0
Cited 5

Intrinsic Geometries in Learning

Emerging Trends in Visual Computing
Adaptive fuzzy filtering in a deterministic setting

IEEE Transactions on Fuzzy Systems
Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering

Pattern Recognition
Linear support vector machines via dual cached loops

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Review: Divergence measures for statistical data processing-An annotated bibliography

Signal Processing

Quantified Score

Hi-index	35.68

Visualization

Abstract

Recently much work has been done analyzing online machine learning algorithms in a worst case setting, where no probabilistic assumptions are made about the data. This is analogous to the H∞ setting used in adaptive linear filtering. Bregman divergences have become a standard tool for analyzing online machine learning algorithms. Using these divergences, we motivate a generalization of the least mean squared (LMS) algorithm. The loss bounds for these so-called p-norm algorithms involve other norms than the standard 2-norm. The bounds can be significantly better if a large proportion of the input variables are irrelevant, i.e., if the weight vector we are trying to learn is sparse. We also prove results for nonstationary targets. We only know how to apply kernel methods to the standard LMS algorithm (i.e., p=2). However, even in the general p-norm case, we can handle generalized linear models where the output of the system is a linear function combined with a nonlinear transfer function (e.g., the logistic sigmoid).