Evaluating Performance and Quality of XML-Based Similarity Joins

  • Authors:
  • Leonardo Ribeiro;Theo Härder

  • Affiliations:
  • AG DBIS, Department of Computer Science, University of Kaiserslautern, Germany;AG DBIS, Department of Computer Science, University of Kaiserslautern, Germany

  • Venue:
  • ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A similarity join correlating fragments in XML documents, which are similar in structure and content, can be used as the core algorithm to support data cleaning and data integration tasks. For this reason, built-in support for such an operator in an XML database management system (XDBMS) is very attractive. However, similarity assessment is especially difficult on XML datasets, because structure, besides textual information, may embody variations in XML documents representing the same real-world entity. Moreover, the similarity computation is considerably more expensive for tree-structured objects and should, therefore, be a prime optimization candidate. In this paper, we explore and optimize tree-based similarity joins and analyze their performance and accuracy when embedded in native XDBMSs.