Top-k keyword search over probabilistic XML data

Authors:
Jianxin Li;Chengfei Liu;Rui Zhou;Wei Wang
Affiliations:
Swinburne University of Technology, Australia;Swinburne University of Technology, Australia;Swinburne University of Technology, Australia;University of New South Wales, Australia
Venue:
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Year:
2011

Citing 0
Cited 10

Semantic relevance ranking for XML keyword search

Information Sciences: an International Journal
Optimal top-k generation of attribute combinations based on ranked lists

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Keywords filtering over probabilistic XML data

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Bayesian network-based probabilistic XML keywords filtering

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
XML filtering with XPath expressions containing parent and ancestor axes

Information Sciences: an International Journal
ELCA evaluation for keyword search on probabilistic XML data

World Wide Web
Efficient processing of top-k twig queries over probabilistic XML data

World Wide Web
Search and result presentation in scientific workflow repositories

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Enhancing web revisitation by contextual keywords

ICWE'13 Proceedings of the 13th international conference on Web Engineering
XML keyword search with promising result type recommendations

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite the proliferation of work on XML keyword query, it remains open to support keyword query over probabilistic XML data. Compared with traditional keyword search, it is far more expensive to answer a keyword query over probabilistic XML data due to the consideration of possible world semantics. In this paper, we firstly define the new problem of studying top-k keyword search over probabilistic XML data, which is to retrieve k SLCA results with the k highest probabilities of existence. And then we propose two efficient algorithms. The first algorithm PrStack can find k SLCA results with the k highest probabilities by scanning the relevant keyword nodes only once. To further improve the efficiency, we propose a second algorithm EagerTopK based on a set of pruning properties which can quickly prune unsatisfied SLCA candidates. Finally, we implement the two algorithms and compare their performance with analysis of extensive experimental results.