A clustering method based on path similarities of XML data

Authors:
Ilhwan Choi;Bongki Moon;Hyoung-Joo Kim
Affiliations:
School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Republic of Korea;Department of Computer Science, University of Arizona, Tucson, AZ 85721, United States;School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Republic of Korea
Venue:
Data & Knowledge Engineering
Year:
2007

Citing 15
Cited 9

Clustering a DAG for CAD Databases

IEEE Transactions on Software Engineering
A stochastic approach for clustering in object bases

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Clustering techniques in object bases: a survey

Data & Knowledge Engineering
Lore: a database management system for semistructured data

ACM SIGMOD Record
Storing semistructured data with STORED

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The String-to-String Correction Problem

Journal of the ACM (JACM)
The design and performance evaluation of alternative XML storage strategies

ACM SIGMOD Record
Partition-Based Clustering in Object Bases: From Theory to Practice

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Relational Databases for Querying XML Documents: Limitations and Opportunities

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
PathGuide: An Efficient Clustering Based Indexing Method for XML Path Expressions

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Efficient Storage of XML Data

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
OrientStore: a schema based native XML storage system

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

A schema matching-based approach to XML schema clustering

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Discovering Groups of Sibling Terms from Web Documents with XTREEM-SG

Journal on Data Semantics XI
The XTREEM Methods for Ontology Learning from Web Documents

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Structural and semantic aspects of similarity of Document Type Definitions and XML schemas

Information Sciences: an International Journal
Element similarity measures in XML schema matching

Information Sciences: an International Journal
XML data clustering: An overview

ACM Computing Surveys (CSUR)
Discovering semantic sibling associations from web documents with XTREEM-SP

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Discovering semantic sibling groups from web documents with XTREEM-SG

EKAW'06 Proceedings of the 15th international conference on Managing Knowledge in a World of Networks
Hierarchical clustering of XML documents focused on structural components

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current studies on the storage of XML data are focused on either the efficient mapping of XML data onto an existing RDBMS or the development of a native XML storage. Some native XML storages store each XML node in a parsed object form. Clustering, which means the physical arrangement of objects, can be an important factor in improving the performance in this storage model. In this paper, we propose a clustering method that stores data nodes in an XML document into the native XML storage. The proposed clustering method uses path similarities between data nodes, which can reduce page I/Os required for query processing. In addition, we propose a query processing method using signatures that facilitate the cluster-level access on the stored data to benefit from the proposed clustering method. This method can process a path query by accessing only a small number of clusters and thus need not use all of the clusters, hence enabling the path query to be processed efficiently by skipping unnecessary data. Finally, we compare the performance of the proposed method with that of the existing ones. Our results show that the performance of XML storage can be improved by using a proper clustering method.