A clustering method based on path similarities of XML data

  • Authors:
  • Ilhwan Choi;Bongki Moon;Hyoung-Joo Kim

  • Affiliations:
  • School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Republic of Korea;Department of Computer Science, University of Arizona, Tucson, AZ 85721, United States;School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Republic of Korea

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current studies on the storage of XML data are focused on either the efficient mapping of XML data onto an existing RDBMS or the development of a native XML storage. Some native XML storages store each XML node in a parsed object form. Clustering, which means the physical arrangement of objects, can be an important factor in improving the performance in this storage model. In this paper, we propose a clustering method that stores data nodes in an XML document into the native XML storage. The proposed clustering method uses path similarities between data nodes, which can reduce page I/Os required for query processing. In addition, we propose a query processing method using signatures that facilitate the cluster-level access on the stored data to benefit from the proposed clustering method. This method can process a path query by accessing only a small number of clusters and thus need not use all of the clusters, hence enabling the path query to be processed efficiently by skipping unnecessary data. Finally, we compare the performance of the proposed method with that of the existing ones. Our results show that the performance of XML storage can be improved by using a proper clustering method.