Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data

Authors:
Amaury Habrard;Marc Bernard;Marc Sebban
Affiliations:
EURISE - Université/ Jean Monnet de Saint-Etienne 23, rue du Dr Paul Michelon, 42023 Saint-Etienne cedex 2, France. amaury.habrard@univ-st-etienne.fr/ marc.bernard@univ-st-etienne.fr/ marc.seb ...;EURISE - Université/ Jean Monnet de Saint-Etienne 23, rue du Dr Paul Michelon, 42023 Saint-Etienne cedex 2, France. amaury.habrard@univ-st-etienne.fr/ marc.bernard@univ-st-etienne.fr/ marc.seb ...;(Correspd.) EURISE - Université/ Jean Monnet de Saint-Etienne 23, rue du Dr Paul Michelon, 42023 Saint-Etienne cedex 2, France. amaury.habrard@univ-st-etienne.fr/ marc.bernard@univ-st-etienne. ...
Venue:
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Year:
2005

Citing 23
Cited 2

Efficient learning of context-free grammars from positive structural examples

Information and Computation
Class-based n-gram models of natural language

Computational Linguistics
The inference of tree languages from finite samples: an algebraic approach

Theoretical Computer Science
Approximating grammar probabilities: solution of a conjecture

Journal of the ACM (JACM)
Predicting Protein Secondary Structure Using Stochastic Tree Grammars

Machine Learning - Special issue on learning with probabilistic representations
Computing the relative entropy between regular tree languages

Information Processing Letters
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
On Exact Learning of Unordered Tree Patterns

Machine Learning
Stochastic Inference of Regular Tree Languages

Machine Learning
Probabilistic k-Testable Tree Languages

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
Stochastic k-testable Tree Languages and Applications

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Generalized Stochastic Tree Automata for Multi-relational Data Mining

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
What Is the Search Space of the Regular Inference?

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Metrics and Similarity Measures for Hidden Markov Models

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Information Extraction in Structured Documents Using Tree Automata Induction

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
On Learning Unions of Pattern Languages and Tree Patterns

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
Sharper Bounds for the Hardness of Prototype and Feature Selection

ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Information extraction from web documents based on local unranked tree automaton inference

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Nearest neighbor pattern classification

IEEE Transactions on Information Theory

Pruning Relations for Substructure Discovery of Multi-relational Databases

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Reducing the size of databases for multirelational classification: a subgraph-based approach

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In front of the large increase of the available amount of structured data (such as XML documents), many algorithms have emerged for dealing with tree-structured data. In this article, we present a probabilistic approach which aims at a priori pruning noisy or irrelevant subtrees in a set of trees. The originality of this approach, in comparison with classic data reduction techniques, comes from the fact that only a part of a tree (i.e. a subtree) can be deleted, rather than the whole tree itself. Our method is based on the use of confidence intervals, on a partition of subtrees, computed according to a given probability distribution. We propose an original approach to assess these intervals on tree-structured data and we experimentally show its interest in the presence of noise.