Fast structural query with application to chinese treebank sentence retrieval
Proceedings of the 2004 ACM symposium on Document engineering
A survey on tree edit distance and related problems
Theoretical Computer Science
Layout based document image retrieval by means of XY tree reduction
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Data & Knowledge Engineering
Fragment-based approximate retrieval in highly heterogeneous XML collections
Data & Knowledge Engineering
Authoring adaptive educational hypermedia on the semantic desktop
International Journal of Learning Technology
Clustered trie structures for approximate search in hierarchical objects collections
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Automatic and manual annotation using flexible schemas for adaptation on the semantic desktop
EC-TEL'06 Proceedings of the First European conference on Technology Enhanced Learning: innovative Approaches for Learning and Knowledge Sharing
Biomonitoring, phylogenetics and anomaly aggregation systems
ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Approximate subtree identification in heterogeneous XML documents collections
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Highly heterogeneous XML collections: how to retrieve precise results?
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Efficient indexing and querying over syntactically annotated trees
Proceedings of the VLDB Endowment
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Hi-index | 0.00 |
An unordered labeled tree is a tree in which each node has a string label and the parent-child relationship is significant, but the order among siblings is unimportant. This paper presents an approach to the nearest neighborsearch problem for these trees. Given a database D ofunordered labeled trees and a query tree Q, the goal is to find those trees in D that "approximately" contain Q. Our approach is based on storing the paths of the trees in a suffix array and then counting the number of mismatching paths between the query tree and a data tree. To speed up a search, we use a hash-based technique to filter out unqualified data trees at an early stage of the search. Experimental results obtained by running our techniques on phylogenetic trees and synthetic data demonstrate the good performance of the proposed approach. We also discuss the use of our work in XML and scientific database management.