VERT: a semantic approach for content search and content extraction in XML query processing

  • Authors:
  • Huayu Wu;Tok Wang Ling;Bo Chen

  • Affiliations:
  • School of Computing, National University of Singapore, Singapore;School of Computing, National University of Singapore, Singapore;School of Computing, National University of Singapore, Singapore

  • Venue:
  • ER'07 Proceedings of the 26th international conference on Conceptual modeling
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Processing a twig pattern query in XML document includes structural search and content search. Most existing algorithms only focus on structural search. They treat content nodes the same as element nodes during query processing with structural joins. Due to the high variety of contents, to mix content search and structural search suffers from management problem of contents and low performance. Another disadvantage is to find the actual values asked by a query, they have to rely on the original document. In this paper, we propose a novel algorithm V alue Extraction with Relational T able (VERT) to overcome these limitations. The main technique of V ERT is introducing relational tables to store document contents instead of treating them as nodes and labeling them. Tables in our algorithm are created based on semantic information of documents. As more semantics is captured, we can further optimize tables and queries to significantly enhance efficiency. Last, we show by experiments that besides solving different content problems, V ERT also has superiority in performance of twig pattern query processing compared with existing algorithms.