Querying xml data: efficiency and security issues

  • Authors:
  • Ada Wai-Chee Fu;Mingfei Jiang

  • Affiliations:
  • The Chinese University of Hong Kong (People's Republic of China);The Chinese University of Hong Kong (People's Republic of China)

  • Venue:
  • Querying xml data: efficiency and security issues
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML is emerging as a widely-used platform-independent data representation language. With increasing interests in XML data, techniques concerning XML evolve rapidly. In this thesis, we study two important issues when querying XML data, efficiency and security, which are essential to an XML searching engine. We take into consideration ID/IDREF attributes, which are common in XML documents. Most related works model an XML document with ID/IDREF attributes as a graph. We retain a tree model, called extended XML tree, in which the IDREF attribute is regarded as an IDREF node, instead of an IDREF edge to the corresponding node. Based on this model, we propose a B+-tree style index (PIN-tree) integrating the essence of the path index and the inverted list approach. A query evaluation algorithm, PINE, based on the model and the indexing is proposed. PINE handles efficiently queries with/without IDREF requests, and IDREF requests can be stated explicitly or implicitly. We prove that PINE is near optimal for twig queries without IDREF requests under the assumption that the number of distinct tag paths to a label is limited. The assumption is proven to be reasonable by experiments. The complexity of PINE for queries with IDREF requests is also given. The security of the XML data draws as much attention as the efficiency problem. In this thesis, we study a promising approach to store the accessibility information, which is based on the Compressed Accessibility Map (CAM). We make two advancements in this direction. (1) Previous work suggests that for each user group and each operation type, a different CAM is built. We observe that the performance and storage requirements can be further improved by combining multiple CAMs into an Integrated CAM (ICAM). We explore this possibility and propose an integration mechanism. (2) If the change in structure of the XML data is not frequent, we suggest an efficient lookup method, which can be applied to CAMS or ICAMs, with a much lower time complexity compared to the previous approach. Experiments were conducted to show the effectiveness of our approaches.