Extended path expressions of XML

  • Authors:
  • Makoto Murata

  • Affiliations:
  • IBM Tokyo Research Lab/IUJ Research Institute, 1623-14, Shimotsuruma, Yamato-shi, Kanagawa-ken 242-8502, Japan

  • Venue:
  • PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query languages for XML often use path expressions to locate elements in XML documents. Path expressions are regular expressions such that underlying alphabets represent conditions on nodes. Path expressions represent conditions on paths from the root, but do not represent conditions on siblings, siblings of ancestors, and descendants of such siblings. In order to capture such conditions, we propose to extend underlying alphabets. Each symbol in an extended alphabet is a triplet (e1, a, e2), where a is a condition on nodes, and e1 (e2) is a condition on elder (resp. younger) siblings and their descendants; e1 and e2 are represented by hedge regular expressions, which are as expressive as hedge automata (hedges are ordered sequences of trees). Nodes matching such an extended path expression can be located by traversing the XML document twice. Furthermore, given an input schema and a query operation controlled by an extended path expression, it is possible to construct an output schema. This is done by identifying, where in the input schema the given extended path expression is satisfied.