Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Frequent Closures as a Concise Representation for Binary Data Mining
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees
IEEE Transactions on Knowledge and Data Engineering
Canonical forms for labelled trees and their applications in frequent subtree mining
Knowledge and Information Systems
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Capturing practical natural language transformations
Machine Translation
Quantitative analysis of treebanks using frequent subtree mining methods
TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
An overview of probabilistic tree transducers for natural language processing
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
The Varro toolkit is a system for identifying and counting a major class of regularity in treebanks and annotated natural language data in the form of tree-structures: frequently recurring unordered subtrees. This software has been designed for use in linguistics to be maximally applicable to actually existing treebanks and other stores of tree-structurable natural language data. It minimizes memory use so that moderately large treebanks are tractable on commonly available computer hardware. This article introduces condensed canonically ordered trees as a data structure for efficiently discovering frequently recurring unordered subtrees.