Efficient probabilistic XML query processing using an extended labeling scheme and a lightweight index

Authors:
Jung-Hee Yun;Chin-Wan Chung
Affiliations:
Information Resource Division, eGovframework Center, National Information-Society Agency, Seoul, South Korea;Division of Web Science and Technology & Department of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Venue:
Information Processing and Management: an International Journal
Year:
2012

Citing 21
Cited 0

A probabilistic relational model and algebra

ACM Transactions on Database Systems (TODS)
A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Probabilistic Interval XML

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
Extension of the Relational Algebra to Probabilistic Complex Values

FoIKS '00 Proceedings of the First International Symposium on Foundations of Information and Knowledge Systems
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A Prime Number Labeling Scheme for Dynamic Ordered XML Trees

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
BLAS: an efficient XPath processing system

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
PEPX: a query-friendly probabilistic XML database

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient structural joins on indexed XML documents

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
ProTDB: probabilistic data in XML

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Matching twigs in probabilistic XML

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Query efficiency in probabilistic XML models

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Incorporating constraints in probabilistic XML

ACM Transactions on Database Systems (TODS)
On the expressiveness of probabilistic XML models

The VLDB Journal — The International Journal on Very Large Data Bases
Query evaluation over probabilistic XML

The VLDB Journal — The International Journal on Very Large Data Bases
Querying and updating probabilistic information in XML

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently there is a growing interest in the data model and query processing for probabilistic XML data. There are many potential applications of probabilistic data, and the XML data model is suitable to represent hierarchical information and data uncertainty of different levels naturally. However, the previously proposed probabilistic XML data models and query processing techniques separate finding data matches with evaluating the probabilities of results. Therefore, they should repeatedly access the data and need to get full data of paths given in queries to calculate the probabilities of results. In this paper, we propose an extended interval-based labeling scheme for the probabilistic XML data tree and an efficient query processing procedure using the labeling scheme. Against previous researches, our method accesses only the labels of data specified in queries and finds data matches simultaneously with evaluating the probability of each data match. Also, we present an extended probabilistic XML query model with the predicates for the values of probabilities and a lightweight index for those probabilities in order to eliminate unnecessary access to data that will not be included in results. Experimental results show that our approach is efficient in probabilistic XML query processing and our index scheme significantly improves the performance of query processing when the predicates for the values of probabilities are given.