Efficient learning of context-free grammars from positive structural examples
Information and Computation
Class-based n-gram models of natural language
Computational Linguistics
The inference of tree languages from finite samples: an algebraic approach
Theoretical Computer Science
Approximating grammar probabilities: solution of a conjecture
Journal of the ACM (JACM)
Predicting Protein Secondary Structure Using Stochastic Tree Grammars
Machine Learning - Special issue on learning with probabilistic representations
Computing the relative entropy between regular tree languages
Information Processing Letters
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
On Exact Learning of Unordered Tree Patterns
Machine Learning
Stochastic Inference of Regular Tree Languages
Machine Learning
Probabilistic k-Testable Tree Languages
ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
Stochastic k-testable Tree Languages and Applications
ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Generalized Stochastic Tree Automata for Multi-relational Data Mining
ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
What Is the Search Space of the Regular Inference?
ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Metrics and Similarity Measures for Hidden Markov Models
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Information Extraction in Structured Documents Using Tree Automata Induction
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
On Learning Unions of Pattern Languages and Tree Patterns
ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
Sharper Bounds for the Hardness of Prototype and Feature Selection
ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Information extraction from web documents based on local unranked tree automaton inference
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Pruning Relations for Substructure Discovery of Multi-relational Databases
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Reducing the size of databases for multirelational classification: a subgraph-based approach
Journal of Intelligent Information Systems
Hi-index | 0.00 |
In front of the large increase of the available amount of structured data (such as XML documents), many algorithms have emerged for dealing with tree-structured data. In this article, we present a probabilistic approach which aims at a priori pruning noisy or irrelevant subtrees in a set of trees. The originality of this approach, in comparison with classic data reduction techniques, comes from the fact that only a part of a tree (i.e. a subtree) can be deleted, rather than the whole tree itself. Our method is based on the use of confidence intervals, on a partition of subtrees, computed according to a given probability distribution. We propose an original approach to assess these intervals on tree-structured data and we experimentally show its interest in the presence of noise.