Evaluation Techniques for Generalized Path Pattern Queries on XML Data

  • Authors:
  • Xiaoying Wu;Dimitri Theodoratos;Stefanos Souldatos;Theodore Dalamagas;Timos Sellis

  • Affiliations:
  • New Jersey Institute of Technology, Newark, USA;New Jersey Institute of Technology, Newark, USA;National Technical University of Athens, Athens, Greece 10682;Institute for the Management of Information Systems, Athens, Greece;National Technical University of Athens, Athens, Greece 10682 and Institute for the Management of Information Systems, Athens, Greece

  • Venue:
  • World Wide Web
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path patterns or tree patterns. Current applications of XML require querying of data whose structure is complex or is not fully known to the user, or integrating XML data sources with different structures. These applications have motivated recently the introduction of query languages that allow a partial specification of path patterns in a query. In this paper, we consider partial path queries, a generalization of path pattern queries, and we focus on their efficient evaluation under the indexed streaming evaluation model. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries. The first one exploits a structural summary of data to generate a set of path patterns that together are equivalent to a partial path query. To evaluate these path patterns, we extend a previous algorithm for path-pattern queries so that it can work on path patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. We analyze our algorithms and perform extensive experimental evaluations. Our experimental results show that the holistic algorithm outperforms the other ones. Our approaches are the first ones to efficiently evaluate this class of queries in the indexed streaming model.