An effective and efficient approach for keyword-based XML retrieval

  • Authors:
  • Xiaoguang Li;Jian Gong;Daling Wang;Ge Yu

  • Affiliations:
  • School of Information Science and Engineering, Northeastern University, Shenyang, P.R.China;School of Information Science and Engineering, Northeastern University, Shenyang, P.R.China;School of Information Science and Engineering, Northeastern University, Shenyang, P.R.China;School of Information Science and Engineering, Northeastern University, Shenyang, P.R.China

  • Venue:
  • WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

IR-style keyword-based search on XML document has become the most common tool for XML query, as users need not to know the structural information of the target XML document before constructing a query. For a keyword-based search engine for XML document, the key issue is how to return some sets of meaningfully related nodes to user’s query efficiently. An ordinary solution of current approaches is to store the relationship of each pair of nodes in an XML document to an index. Obviously, this will lead to serious storage overhead. In this paper, we propose an enhanced inverted index structure (PN-Inverted Index) that stores path information in addition to node ID, and import and extend the concept of LCA to PLCA. Efficient algorithms with these concepts are designed to check the relationship of arbitrary number of nodes. Compared with existing approaches, our approach need not create additional relationship index but just utilize the existing inverted index that is much common for IR-style keyword search engine. Experimental results show that with the promise of returning meaningful answers, our search engine offers great performance benefits. Although the size of the inverted index is increased, the total size of indices of search engine is smaller than the existing approaches.