A Low-Storage-Consumption XML Labeling Method for Efficient Structural Information Extraction

  • Authors:
  • Wenxin Liang;Akihiro Takahashi;Haruo Yokota

  • Affiliations:
  • School of Software, Dalian University of Technology,;Department of Computer Science, Tokyo Institute of Technology,;Department of Computer Science, Tokyo Institute of Technology, and Global Scientific Information and Computing Center, Tokyo Institute of Technology,

  • Venue:
  • DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, labeling methods to extract and reconstruct the structural information of XML data, which are important for many applications such as XPath query and keyword search, are becoming more attractive. To achieve efficient structural information extraction, in this paper we propose C-DO-VLEI code, a novel update-friendly bit-vector encoding scheme, based on register-length bit operations combining with the properties of Dewey Order numbers, which cannot be implemented in other relevant existing schemes such as ORDPATH. Meanwhile, the proposed method also achieves lower storage consumption because it does not require either prefix schema or any reserved codes for node insertion. We performed experiments to evaluate and compare the performance and storage consumption of the proposed method with those of the ORDPATH method. Experimental results show that the execution times for extracting depth information and parent node labels using the C-DO-VLEI code are about 25% and 15% less, respectively, and the average label size using the C-DO-VLEI code is about 24% smaller, comparing with ORDPATH.