Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data

  • Authors:
  • Amaury Habrard;Marc Bernard;Marc Sebban

  • Affiliations:
  • EURISE - Université/ Jean Monnet de Saint-Etienne 23, rue du Dr Paul Michelon, 42023 Saint-Etienne cedex 2, France. amaury.habrard@univ-st-etienne.fr/ marc.bernard@univ-st-etienne.fr/ marc.seb ...;EURISE - Université/ Jean Monnet de Saint-Etienne 23, rue du Dr Paul Michelon, 42023 Saint-Etienne cedex 2, France. amaury.habrard@univ-st-etienne.fr/ marc.bernard@univ-st-etienne.fr/ marc.seb ...;(Correspd.) EURISE - Université/ Jean Monnet de Saint-Etienne 23, rue du Dr Paul Michelon, 42023 Saint-Etienne cedex 2, France. amaury.habrard@univ-st-etienne.fr/ marc.bernard@univ-st-etienne. ...

  • Venue:
  • Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In front of the large increase of the available amount of structured data (such as XML documents), many algorithms have emerged for dealing with tree-structured data. In this article, we present a probabilistic approach which aims at a priori pruning noisy or irrelevant subtrees in a set of trees. The originality of this approach, in comparison with classic data reduction techniques, comes from the fact that only a part of a tree (i.e. a subtree) can be deleted, rather than the whole tree itself. Our method is based on the use of confidence intervals, on a partition of subtrees, computed according to a given probability distribution. We propose an original approach to assess these intervals on tree-structured data and we experimentally show its interest in the presence of noise.