Fast algorithms for the unit cost editing distance between trees
Journal of Algorithms
Identifying syntactic differences between two programs
Software—Practice & Experience
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Pattern matching algorithms
Extracting Characteristic Structures among Words in Semistructured Documents
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Indexing Text with Approximate q-Grams
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Better Filtering with Gapped q-Grams
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts
DS '01 Proceedings of the 4th International Conference on Discovery Science
Correlating XML data streams using tree-edit distance embeddings
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A survey on tree edit distance and related problems
Theoretical Computer Science
A theoretical analysis of alignment and edit problems for trees
ICTCS'05 Proceedings of the 9th Italian conference on Theoretical Computer Science
Constant time generation of trees with specified diameter
WG'04 Proceedings of the 30th international conference on Graph-Theoretic Concepts in Computer Science
A Tree Distance Function Based on Multi-sets
New Frontiers in Applied Data Mining
Sibling Distance for Rooted Labeled Trees
New Frontiers in Applied Data Mining
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
An efficient unordered tree kernel and its application to glycan classification
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Hi-index | 0.00 |
In this paper, we investigate the q-gram distance for ordered unlabeled trees (trees, for short). First, we formulate a q-gram as simply a tree with q nodes isomorphic to a line graph, and the q-gram distance between two trees as similar as one between two strings. Then, by using the depth sequence based on postorder, we design the algorithm EnumGram to enumerate all q-grams in a tree T with n nodes which runs in O(n2) time and in O(q) space. Furthermore, we improve it to the algorithm LinearEnumGram which runs in O(qn) time and in O(qd) space, where d is the depth of T. Hence, we can evaluate the q-gram distance Dq(T1,T2) between T1 and T2 in O(q maxn1, n2) time and in O(q maxd1, d2) space, where ni and di are the number of nodes in Ti and the depth of Ti, respectively. Finally, we show the relationship between the q-gram distance Dq(T1,T2) and the edit distanceE(T1,T2) that Dq(T1,T2)≤ (gl+1)E(T1,T2), where g=max{g1, g2}, l=max{l1, l2}, gi is the degree of Ti and li is the number of leaves in Ti. In particular, for the top-down tree edit distanceF(T1,T2), this result implies that $D_{q}(T_{1}, T_{2}) \leq {\rm min}\{g^{q-2}, l - 1\}\{F(T_{1}, T_{2})\}$.