Leveraging the storage layer to support XML similarity joins in XDBMSs

Authors:
Leonardo Andrade Ribeiro;Theo Härder
Affiliations:
Department of Computer Science, Federal University of Lavras, Brazil;AG DBIS, Department of Computer Science, University of Kaiserslautern, Germany
Venue:
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Year:
2012

Citing 12
Cited 0

TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
Querying structured text in an XML database

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structure and content scoring for XML

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating XML data sources using approximate joins

ACM Transactions on Database Systems (TODS)
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
TopX: efficient and versatile top-k query processing for semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases
Evaluating Performance and Quality of XML-Based Similarity Joins

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
A cluster-based approach to XML similarity joins

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
The pq-gram distance between ordered labeled trees

ACM Transactions on Database Systems (TODS)
Binary XML storage and query processing in Oracle 11g

Proceedings of the VLDB Endowment
Generalizing prefix filtering to improve set similarity joins

Information Systems
Keyword-based search and exploration on databases

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML is widely applied to describe semi-structured data commonly generated and used by modern information systems. XML database management systems (XDBMSs) are thus essential platforms in this context. Most XDBMS architectures proposed so far aim at reproducing functionalities found in relational systems. As such, these architectures inherit the same deficiency of traditional systems in dealing with less-structured data. What is badly needed is efficient support of common database operations under the similarity matching paradigm. In this paper, we present an engineering approach to incorporating similarity joins into XDBMSs, which exploits XDBMS components--the storage layer in particular--to design efficient algorithms. We experimentally confirm the accuracy, performance, and scalability of our approach.