ATreeGrep: Approximate Searching in Unordered Trees

Authors:
Dennis Shasha;Jason Tsong-Li Wang;Huiyuan Shan;Kaizhong Zhang
Affiliations:
-;-;-;-
Venue:
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Year:
2002

Citing 0
Cited 15

Fast structural query with application to chinese treebank sentence retrieval

Proceedings of the 2004 ACM symposium on Document engineering
A survey on tree edit distance and related problems

Theoretical Computer Science
Layout based document image retrieval by means of XY tree reduction

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Parameterized pattern queries

Data & Knowledge Engineering
Fragment-based approximate retrieval in highly heterogeneous XML collections

Data & Knowledge Engineering
Authoring adaptive educational hypermedia on the semantic desktop

International Journal of Learning Technology
Clustered trie structures for approximate search in hierarchical objects collections

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Automatic and manual annotation using flexible schemas for adaptation on the semantic desktop

EC-TEL'06 Proceedings of the First European conference on Technology Enhanced Learning: innovative Approaches for Learning and Knowledge Sharing
pest: Fast approximate keyword search in semantic data using eigenvector-based term propagation

Information Systems
Biomonitoring, phylogenetics and anomaly aggregation systems

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Approximate subtree identification in heterogeneous XML documents collections

XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Highly heterogeneous XML collections: how to retrieve precise results?

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Efficient indexing and querying over syntactically annotated trees

Proceedings of the VLDB Endowment
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences

Quantified Score

Hi-index	0.00

Visualization

Abstract

An unordered labeled tree is a tree in which each node has a string label and the parent-child relationship is significant, but the order among siblings is unimportant. This paper presents an approach to the nearest neighborsearch problem for these trees. Given a database D ofunordered labeled trees and a query tree Q, the goal is to find those trees in D that "approximately" contain Q. Our approach is based on storing the paths of the trees in a suffix array and then counting the number of mismatching paths between the query tree and a data tree. To speed up a search, we use a hash-based technique to filter out unqualified data trees at an early stage of the search. Experimental results obtained by running our techniques on phylogenetic trees and synthetic data demonstrate the good performance of the proposed approach. We also discuss the use of our work in XML and scientific database management.