Efficiency versus convergence of Boolean kernels for on-line learning algorithms

Authors:
Roni Khardon;Dan Roth;Rocco A. Servedio
Affiliations:
Department of Computer Science, Tufts University, Medford, MA;Department of Computer Science, University of Illinois, Urbana, IL;Department of Computer Science, Columbia University, New York, NY
Venue:
Journal of Artificial Intelligence Research
Year:
2005

Citing 11
Cited 7

A theory of the learnable

Communications of the ACM
Negative Results for Equivalence Queries

Machine Learning
An introduction to computational learning theory

An introduction to computational learning theory
Efficient learning with virtual threshold gates

Information and Computation
Learning to resolve natural language ambiguities: a unified approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Introduction to Coding Theory

Introduction to Coding Theory
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
Learning of Boolean Functions Using Support Vector Machines

ALT '01 Proceedings of the 12th International Conference on Algorithmic Learning Theory
Path kernels and multiplicative updates

The Journal of Machine Learning Research
On approximating weighted sums with exponentially many terms

Journal of Computer and System Sciences

Path Kernels and Multiplicative Updates

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Improved MCMC sampling methods for estimating weighted sums in Winnow with application to DNF learning

Machine Learning
Off-Line Learning with Transductive Confidence Machines: An Empirical Evaluation

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Kernel Functions Based on Derivation

New Frontiers in Applied Data Mining
Online Rule Learning via Weighted Model Counting

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Leaving the span

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Active learning of combinatorial features for interactive optimization

LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper studies machine learning problems where each example is described using a set of Boolean features and where hypotheses are represented by linear threshold elements. One method of increasing the expressiveness of learned hypotheses in this context is to expand the feature set to include conjunctions of basic features. This can be done explicitly or where possible by using a kernel function. Focusing on the well known Perceptron and Winnow algorithms, the paper demonstrates a tradeoff between the computational efficiency with which the algorithm can be run over the expanded feature space and the generalization ability of the corresponding learning algorithm. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithm over a feature space of exponentially many conjunctions; however we also show that using such kernels, the Perceptron algorithm can provably make an exponential number of mistakes even when learning simple functions. We then consider the question of whether kernel functions can analogously be used to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. Known upper bounds imply that the Winnow algorithm can learn Disjunctive Normal Form (DNF) formulae with a polynomial mistake bound in this setting. However, we prove that it is computationally hard to simulate Winnow's behavior for learning DNF over such a feature set. This implies that the kernel functions which correspond to running Winnow for this problem are not efficiently computable, and that there is no general construction that can run Winnow with kernels.