Communications of the ACM
Information Processing Letters
From on-line to batch learning
COLT '89 Proceedings of the second annual workshop on Computational learning theory
Teachability in computational learning
New Generation Computing - Selected papers from the international workshop on algorithmic learning theory,1990
A computational model of teaching
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
On the power of inductive inference from good examples
Theoretical Computer Science
Journal of Computer and System Sciences
On specifying Boolean functions by labelled examples
Discrete Applied Mathematics
COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Journal of Computer and System Sciences
A model of interactive teaching
Journal of Computer and System Sciences - special issue on complexity theory
P = BPP if E requires exponential circuits: derandomizing the XOR lemma
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
On the limits of efficient teachability
Information Processing Letters
In search of an easy witness: exponential time vs. probabilistic polynomial time
Journal of Computer and System Sciences - Complexity 2001
Machine Learning
Learning from Different Teachers
Machine Learning
On space-bounded learning and the vapnik-chervonenkis dimension
On space-bounded learning and the vapnik-chervonenkis dimension
Derandomizing polynomial identity tests means proving circuit lower bounds
Computational Complexity
Measuring teachability using variants of the teaching dimension
Theoretical Computer Science
Recent Developments in Algorithmic Teaching
LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
Models of Cooperative Teaching and Learning
The Journal of Machine Learning Research
Teaching learners with restricted mind changes
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
COLT'06 Proceedings of the 19th annual conference on Learning Theory
A theory of goal-oriented communication
Journal of the ACM (JACM)
Hi-index | 0.00 |
We consider a model of teaching in which the learners are consistent and have bounded state, but are otherwise arbitrary. The teacher is non-interactive and "massively open": the teacher broadcasts a sequence of examples of an arbitrary target concept, intended for every possible on-line learning algorithm to learn from. We focus on the problem of designing interesting teachers: efficient sequences of examples that lead all capable and consistent learners to learn concepts, regardless of the underlying algorithm used by the learner. We use two measures of teaching efficiency: the number of mistakes made by the worst-case learner, and the maximum length of the example sequence needed for the worst-case learner. Our results are summarized as follows: Given a uniform random sequence of examples of an n-bit concept function, learners (capable of consistently learning the concept) with s(n) bits of state are guaranteed to make only O(n ⋅ s(n)) mistakes and exactly learn the concept, with high probability. This theorem has interesting corollaries; for instance, every concept c has a sequence of examples can teach c to all capable consistent on-line learners implementable with s(n)-size circuits, such that every learner makes only ~O(s(n)^2) mistakes. That is, all resource-bounded algorithms capable of consistently learning a concept can be simultaneously taught that concept with few mistakes, on a single example sequence. We also show how to efficiently generate such a sequence of examples on-line: using Nisan's pseudorandom generator, each example in the sequence can be generated with polynomial-time overhead per example, with an O(n ⋅ s(n))-bit initial seed. To justify our use of randomness, we prove that any non-trivial derandomization of our sequences would imply new circuit lower bounds. For instance, if there is a deterministic 2n O(1) time algorithm that generates a sequence of examples, such that all consistent and capable polynomial-size circuit learners learn the all-zeroes concept with less than 2n mistakes, then EXP ⊄ P. We present examples illustrating that the key differences in our model -- our focus on mistakes rather than the total number of examples, and our use of a state bound -- must be considered together to obtain our results. We show that for every consistent s(n)-state bounded learner A, and every n-bit concept that A is capable of learning, there is a custom "tutoring" sequence of only O(n ⋅ s(n)) examples that teaches A the concept. That is, in principle, there are no slow learners, only bad teachers: if a state-bounded learner is capable of learning a concept at all, then it can always be taught that concept quickly via some short sequence of examples.