A bounded distance metric for comparing tree structure

Authors:
Richard Connor;Fabio Simeoni;Michael Iakovos;Robert Moss
Affiliations:
Department of Computer and Information Sciences, University of Strathclyde, Glasgow G1 1HX, Scotland, UK;Department of Computer and Information Sciences, University of Strathclyde, Glasgow G1 1HX, Scotland, UK;Department of Computer and Information Sciences, University of Strathclyde, Glasgow G1 1HX, Scotland, UK;Department of Computer and Information Sciences, University of Strathclyde, Glasgow G1 1HX, Scotland, UK
Venue:
Information Systems
Year:
2011

Citing 13
Cited 2

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
The advantages of electronic data interchange

ACM SIGMIS Database
A mathematical theory of communication

ACM SIGMOBILE Mobile Computing and Communications Review
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
The similarity metric

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
An abstraction-based approach to measuring the structural similarity between two unordered XML documents

ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Exploiting structural similarity for effective Web information extraction

Data & Knowledge Engineering
Measuring the structural similarity of semistructured documents using entropy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An Introduction to Kolmogorov Complexity and Its Applications

An Introduction to Kolmogorov Complexity and Its Applications
A Tree Distance Function Based on Multi-sets

New Frontiers in Applied Data Mining
A methodology for clustering XML documents by structure

Information Systems
Information distance

IEEE Transactions on Information Theory
Shared information and program plagiarism detection

IEEE Transactions on Information Theory

Towards a universal information distance for structured data

Proceedings of the Fourth International Conference on SImilarity Search and APplications
A multivariate correlation distance for vector spaces

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Comparing tree-structured data for structural similarity is a recurring theme and one on which much effort has been spent. Most approaches so far are grounded, implicitly or explicitly, in algorithmic information theory, being approximations to an information distance derived from Kolmogorov complexity. In this paper we propose a novel complexity metric, also grounded in information theory, but calculated via Shannon's entropy equations. This is used to formulate a directly and efficiently computable metric for the structural difference between unordered trees. The paper explains the derivation of the metric in terms of information theory, and proves the essential property that it is a distance metric. The property of boundedness means that the metric can be used in contexts such as clustering, where second-order comparisons are required. The distance metric property means that the metric can be used in the context of similarity search and metric spaces in general, allowing trees to be indexed and stored within this domain. We are not aware of any other tree similarity metric with these properties.