Measuring similarity between collection of values

  • Authors:
  • Carina F. Dorneles;Carlos A. Heuser;Andrei E. N. Lima;Altigran Soares da Silva;Edleno Silva de Moura

  • Affiliations:
  • Universidade Federal do Rio Grande do Sul(UFRGS);Universidade Federal do Rio Grande do Sul(UFRGS);Universidade Federal do Rio Grande do Sul(UFRGS);Universidade Federal do Amazonas;Universidade Federal do Amazonas

  • Venue:
  • Proceedings of the 6th annual ACM international workshop on Web information and data management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents. Following the data model presented in TAX algebra, we treat an XML element as a labeled ordered rooted tree. Consider that XML nodes can be either atomic, i.e, they may contain single values such as short character strings, date, etc, or complex, i.e., nested structures that contain other nodes, we propose two types of similarity metrics: MAVs, for atomic nodes and MCVs, for complex nodes. In the first case, we suggest the use of several application domain dependent metrics. In the second case, we define metrics for complex values that are structure dependent, and can be distinctly applied for it and collections of values. We also present experiments showing the effectiveness of our method.