Efficient processing of top-k twig queries over probabilistic XML data

Authors:
Bo Ning;Chengfei Liu;Jeffrey Xu Yu
Affiliations:
Dalian Maritime University, Liaoning, China;Swinburne University of Technology, VIC, Australia;The Chinese University of Hongkong, Hongkong, China
Venue:
World Wide Web
Year:
2013

Citing 21
Cited 2

On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Accelerating XPath location steps

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Probabilistic Interval XML

ICDT '03 Proceedings of the 9th International Conference on Database Theory
From region encoding to extended dewey: on efficient processing of XML twig pattern matching

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Constraint Preserving Transformation from Relational Schema to XML Schema

World Wide Web
On the complexity of managing probabilistic XML data

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ProTDB: probabilistic data in XML

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Matching twigs in probabilistic XML

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Query efficiency in probabilistic XML models

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Holistically Stream-based Processing Xtwig Queries

World Wide Web
Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations

IEEE Transactions on Knowledge and Data Engineering
Query ranking in probabilistic XML data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Processing of Top-k Queries in Uncertain Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Twiglist: make twig pattern matching fast

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Adaptive relaxation for querying heterogeneous XML data sources

Information Systems
Evaluation Techniques for Generalized Path Pattern Queries on XML Data

World Wide Web
Top-k keyword search over probabilistic XML data

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Querying and updating probabilistic information in XML

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

XML filtering with XPath expressions containing parent and ancestor axes

Information Sciences: an International Journal
Probabilistic Web Data Management

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.