Learning k-modal distributions via testing

Authors:
Constantinos Daskalakis;Ilias Diakonikolas;Rocco A. Servedio
Affiliations:
UC Berkeley;UC Berkeley;Columbia University
Venue:
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Year:
2012

Citing 4
Cited 2

Property testing and its connection to learning and approximation

Journal of the ACM (JACM)
Testing problems with sublearning sample complexity

Journal of Computer and System Sciences
Sublinear algorithms for testing monotone and unimodal distributions

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Property Testing: A Learning Theory Perspective

Foundations and Trends® in Machine Learning

Learning poisson binomial distributions

STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Learning mixtures of arbitrary distributions over large discrete domains

Proceedings of the 5th conference on Innovations in theoretical computer science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A k-modal probability distribution over the domain {1,..., n} is one whose histogram has at most k "peaks" and "valleys." Such distributions are natural generalizations of monotone (k = 0) and unimodal (k = 1) probability distributions, which have been intensively studied in probability theory and statistics. In this paper we consider the problem of learning an unknown k-modal distribution. The learning algorithm is given access to independent samples drawn from the k-modal distribution p, and must output a hypothesis distribution p such that with high probability the total variation distance between p and p is at most ε. We give an efficient algorithm for this problem that runs in time poly(k, log(n), 1/ε). For k ≤ Õ(√ log n), the number of samples used by our algorithm is very close (within an Õ(log(1/ε)) factor) to being information-theoretically optimal. Prior to this work computationally efficient algorithms were known only for the cases k = 0, 1 [Bir87b, Bir97]. A novel feature of our approach is that our learning algorithm crucially uses a new property testing algorithm as a key subroutine. The learning algorithm uses the property tester to efficiently decompose the k-modal distribution into k (near)-monotone distributions, which are easier to learn.