Efficient LCA based keyword search in xml data

  • Authors:
  • Yu Xu;Yannis Papakonstantinou

  • Affiliations:
  • Teradata, San Diego, CA;University of California San Diego, San Diego, CA

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Keyword search in XML documents based on the notion of lowest common ancestors (LCAs) and modifications of it has recently gained research interest [2, 3, 4]. In this paper we propose an efficient algorithm called Indexed Stack to find answers to keyword queries based on XRank's semantics to LCA [2]. The complexity of the Indexed Stack algorithm is O(kd|S1|\log|S|) where k is the number of keywords in the query, d is the depth of the tree and |S1 | (|S|) is the occurrence of the least (most) frequent keyword in the query. In comparison, the best worst case complexity of the core algorithms in [2] is O(kd|S|). We analytically and experimentally evaluate the Indexed Stack algorithm and the two core algorithms in [2]. The results show that the Indexed Stack algorithm outperforms in terms of both CPU and I/O costs other algorithms by orders of magnitude when the query contains at least one low frequency keyword along with high frequency keywords.