Propagation-vectors for trees (PVT): concise yet effective summaries for hierarchical data and trees

Authors:
Venkata Snehith Cherukuri;Kasim Selçuk Candan
Affiliations:
Arizona State University, Tempe, AZ, USA;Arizona State University, Tempe, AZ, USA
Venue:
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Year:
2008

Citing 19
Cited 4

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Approximate matching for two families of trees

Information and Computation
Sparse Dynamic Programming for Evolutionary-Tree Comparison

SIAM Journal on Computing
Tree pattern matching

Pattern matching algorithms
Application of Spreading Activation Techniques in InformationRetrieval

Artificial Intelligence Review
The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)
Extended Boolean information retrieval

Communications of the ACM
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Comparing Hierarchical Data in External Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The editing distance between trees: algorithms and applications

The editing distance between trees: algorithms and applications
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
Finding Syntactic Similarities Between XML Documents

DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
CP/CV: concept similarity mining without frequency information from domain describing taxonomies

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
FICSR: feedback-based inconsistency resolution and query processing on misaligned data sources

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Integrating and querying taxonomies with quest in the presence of conflicts

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Measuring the structural similarity of semistructured documents using entropy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Improving web data annotations with spreading activation

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Clustering by compression

IEEE Transactions on Information Theory

Workshop on large-scale distributed systems for information retrieval

ACM SIGIR Forum
Measuring structural similarity of semistructured data based on information-theoretic approaches

The VLDB Journal — The International Journal on Very Large Data Bases
NeMa: fast graph search with label similarity

Proceedings of the VLDB Endowment
Hierarchical co-clustering: off-line and incremental approaches

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Summarization of hierarchical data and metadata is a fundamental operation in applications in many domains. In particular, similarity search of hierarchical data, such as XML, would benefit greatly from concise and indexable summaries. This is especially true in P2P scenarios, where the search needs to be done in a distributed fashion on multiple peers. This situation requires summaries which are small, yet effective in identifying potential peers that need to be further explored. In this paper, we propose a method, called propagation-vectors for trees (PVT) which constructs very concise and accurate summaries of hierarchical data, such as XML trees. We then show how to use this summary to perform similarity search on summarized data. The proposed summarization scheme relies on a label-propagation mechanism, which constructs an n-dimensional vector from a given tree with n unique data labels. Experimental results have shown that the constructed PVT summaries capture the structure of the input trees very accurately, the representations are highly concise, and that the search based on these summaries are faster than the existing approaches.