Communications of the ACM
Efficient noise-tolerant learning from statistical queries
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Weakly learning DNF and characterizing statistical query learning using Fourier analysis
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Specification and simulation of statistical query algorithms for efficiency and noise tolerance
COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
On Learning Correlated Boolean Functions Using Statistical Queries
ALT '01 Proceedings of the 12th International Conference on Algorithmic Learning Theory
On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Characterizing statistical query learning: simplified notions and proofs
ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Spectral norm in learning theory: some selected topics
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Hi-index | 0.00 |
We prove two lower bounds on the Statistical Query (SQ) learning model. The first lower bound is on weak-learning. We prove that for a concept class of SQ-dimension d, a running time of 驴(d/ log d) is needed. The SQ-dimension of a concept class is defined to be the maximum number of concepts that are "uniformly correlated", in that each pair of them have nearly the same correlation. This lower bound matches the upper bound in [BFJ+94], up to a logarithmic factor. We prove this lower bound against an "honest SQ-oracle", which gives a stronger result than the ones against the more frequently used "adversarial SQ-oracles". The second lower bound is more general. It gives a continuous trade-off between the "advantage" of an algorithm in learning the target function and the number of queries it needs to make, where the advantage of an algorithm is the probability it succeeds in predicting a label minus the probability it doesn't. Both lower bounds extend and/or strengthen previous results, and solved an open problem left in [Y01].