Efficient LCA based keyword search in xml data

Authors:
Yu Xu;Yannis Papakonstantinou
Affiliations:
Teradata, San Diego, CA;University of California San Diego, San Diego, CA
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 3
Cited 4

XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Retrieving meaningful relaxed tightest fragments for XML keyword search

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Hash-Search: An Efficient SLCA-Based Keyword Search Algorithm on XML Documents

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Editorial: BioDB: An ontology-enhanced information system for heterogeneous biological information

Data & Knowledge Engineering
Distributed SLCA-based XML keyword search by map-reduce

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword search in XML documents based on the notion of lowest common ancestors (LCAs) and modifications of it has recently gained research interest [2, 3, 4]. In this paper we propose an efficient algorithm called Indexed Stack to find answers to keyword queries based on XRank's semantics to LCA [2]. The complexity of the Indexed Stack algorithm is O(kd|S1|\log|S|) where k is the number of keywords in the query, d is the depth of the tree and |S1 | (|S|) is the occurrence of the least (most) frequent keyword in the query. In comparison, the best worst case complexity of the core algorithms in [2] is O(kd|S|). We analytically and experimentally evaluate the Indexed Stack algorithm and the two core algorithms in [2]. The results show that the Indexed Stack algorithm outperforms in terms of both CPU and I/O costs other algorithms by orders of magnitude when the query contains at least one low frequency keyword along with high frequency keywords.