A similarity measure to assess the stability of classification trees

  • Authors:
  • Bénédicte Briand;Gilles R. Ducharme;Vanessa Parache;Catherine Mercat-Rommens

  • Affiliations:
  • IRSN, Laboratoire d'Etudes Radioécologiques en milieux Continental et Marin, DEI/SESURE/LERCM, Cadarache, bít. 153, BP 3, 13115 Saint-Paul-lez-Durance, France;Equipe de probabilités et statistique, I3M, Université Montpellier II, cc 051, Place Eugène Bataillon, 34095 Montpellier Cedex 5, France;IRSN, Laboratoire d'Etudes Radioécologiques en milieux Continental et Marin, DEI/SESURE/LERCM, Cadarache, bít. 153, BP 3, 13115 Saint-Paul-lez-Durance, France;IRSN, Laboratoire d'Etudes Radioécologiques en milieux Continental et Marin, DEI/SESURE/LERCM, Cadarache, bít. 153, BP 3, 13115 Saint-Paul-lez-Durance, France

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2009

Quantified Score

Hi-index 0.03

Visualization

Abstract

It has been recognized that Classification trees (CART) are unstable; a small perturbation in the input variables or a fresh sample can lead to a very different classification tree. Some approaches exist that try to correct this instability. However, their benefits can, at present, be appreciated only qualitatively. A similarity measure between two classification trees is introduced that can measure their closeness. Its usefulness is illustrated with synthetic data on the impact of radioactivity deposit through the environment. In this context, a modified node level stabilizing technique, referred to as the NLS-REP method, is introduced and shown to be more stable than the classical CART method.