On domain-partitioning induction criteria: worst-case bounds for the worst-case based

  • Authors:
  • Richard Nock;Frank Nielsen

  • Affiliations:
  • Grimaag-Département Scientifique Interfacultaire, Université des Antilles-Guyane, Campus de Schoelcher, BP 7209, 97275 Schoelcher, Martinique, France;SONY CS Labs Inc., 3-14-13 Higashi Gotanda, Shinagawa-Ku, Tokyo 141-0022, Japan

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2004

Quantified Score

Hi-index 5.23

Visualization

Abstract

One of the most popular induction scheme for supervised learning is also one of the oldest. It builds a classifier in a top-down fashion, following the minimization of a so-called index criterion. While numerous papers have reported experiments on this scheme, little has been known on its theoretical aspect until recent works on decision trees and branching programs using a powerful classification tool: boosting.In this paper, we look at this problem from a worst-case computational (rather than informational) standpoint. Our conclusions for the ranking of these indexes minimization follow almost exactly that of boosting (with matching upper and lowerbounds), and provide extensions to more classes of Boolean formulas such as decision lists, multilinear polynomials and symmetric functions. Our results also exhibit a strong worst-case for the induction scheme, as we build particularly hard samples for which the replacement of most index criteria, or the class of concept representation, even when producing the same ranking as boosting does for the indexes, makes no difference at all for the concept induced. This is clearly not a limit of previous analyses, but a consequence of the induction scheme.