Evaluating Performance and Quality of XML-Based Similarity Joins

Authors:
Leonardo Ribeiro;Theo Härder
Affiliations:
AG DBIS, Department of Computer Science, University of Kaiserslautern, Germany;AG DBIS, Department of Computer Science, University of Kaiserslautern, Germany
Venue:
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Year:
2008

Citing 14
Cited 2

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Approximate String Joins in a Database (Almost) for Free

Proceedings of the 27th International Conference on Very Large Data Bases
Text joins in an RDBMS for web data integration

WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient set joins on similarity predicates

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximate matching of hierarchical data using pq-grams

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A Primitive Operator for Similarity Joins in Data Cleaning

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Integrating XML data sources using approximate joins

ACM Transactions on Database Systems (TODS)
Efficient exact set-similarity joins

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Node labeling schemes for dynamic XML documents reconsidered

Data & Knowledge Engineering
An efficient infrastructure for native transactional XML processing

Data & Knowledge Engineering
Benchmarking declarative approximate selection predicates

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Comparison of Complete and Elementless Native Storage of XML Documents

IDEAS '07 Proceedings of the 11th International Database Engineering and Applications Symposium

Ingredients for accurate, fast, and robust XML similarity joins

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Leveraging the storage layer to support XML similarity joins in XDBMSs

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A similarity join correlating fragments in XML documents, which are similar in structure and content, can be used as the core algorithm to support data cleaning and data integration tasks. For this reason, built-in support for such an operator in an XML database management system (XDBMS) is very attractive. However, similarity assessment is especially difficult on XML datasets, because structure, besides textual information, may embody variations in XML documents representing the same real-world entity. Moreover, the similarity computation is considerably more expensive for tree-structured objects and should, therefore, be a prime optimization candidate. In this paper, we explore and optimize tree-based similarity joins and analyze their performance and accuracy when embedded in native XDBMSs.