Efficient processing of top-k twig queries over probabilistic XML data

  • Authors:
  • Bo Ning;Chengfei Liu;Jeffrey Xu Yu

  • Affiliations:
  • Dalian Maritime University, Liaoning, China;Swinburne University of Technology, VIC, Australia;The Chinese University of Hongkong, Hongkong, China

  • Venue:
  • World Wide Web
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.