Concentration inequalities for the missing mass and for histogram rule error

  • Authors:
  • David McAllester;Luis Ortiz

  • Affiliations:
  • Toyota Technological Institute at Chicago, 1427 East 60th Street, Chicago Il;Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA

  • Venue:
  • The Journal of Machine Learning Research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper gives distribution-free concentration inequalities for the missing mass and the error rate of histogram rules. Negative association methods can be used to reduce these concentration problems to concentration questions about independent sums. Although the sums are independent, they are highly heterogeneous. Such highly heterogeneous independent sums cannot be analyzed using standard concentration inequalities such as Hoeffding's inequality, the Angluin-Valiant bound, Bernstein's inequality, Bennett's inequality, or McDiarmid's theorem. The concentration inequality for histogram rule error is motivated by the desire to construct a new class of bounds on the generalization error of decision trees.