Efficient keyword search over data-centric XML documents

  • Authors:
  • Guoliang Li;Jianhua Feng;Na Ta;Lizhu Zhou

  • Affiliations:
  • Department of Computer Science and Technology, Tsinghua University, Beijing, P.R. China;Department of Computer Science and Technology, Tsinghua University, Beijing, P.R. China;Department of Computer Science and Technology, Tsinghua University, Beijing, P.R. China;Department of Computer Science and Technology, Tsinghua University, Beijing, P.R. China

  • Venue:
  • APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

We in this paper investigate keyword search over data-centric XML documents. We first present a novel method to divide an XML document into self-integrated subtrees, which are connected subtrees and can capture different structural information of the XML document. We then propose the meaningful self-integrated trees, which contain all the keywords and describe how the keywords are interrelated, to answer keyword search over XML documents. In addition, we introduce the B+-tree index to accelerate the retrieval of those meaningful self-integrated trees. Moreover, to further enhance the performance of keyword search, we present Bloom Filter to improve the efficiency of generating those meaningful self-integrated trees. Finally, we conducted extensive experiments to evaluate the performance of our method, and the experimental results demonstrate that our method achieves high efficiency and outperforms the existing approaches significantly.