Querying structured text in an XML database

  • Authors:
  • Shurug Al-Khalifa;Cong Yu;H. V. Jagadish

  • Affiliations:
  • University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI

  • Venue:
  • Proceedings of the 2003 ACM SIGMOD international conference on Management of data
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML databases often contain documents comprising structured text. Therefore, it is important to integrate "information retrieval style" query evaluation, which is well-suited for natural language text, with standard "database style" query evaluation, which handles structured queries efficiently. Relevance scoring is central to information retrieval. In the case of XML, this operation becomes more complex because the data required for scoring could reside not directly in an element itself but also in its descendant elements.In this paper, we propose a bulk-algebra, TIX, and describe how it can be used as a basis for integrating information retrieval techniques into a standard pipelined database query evaluation engine. We develop new evaluation strategies essential to obtaining good performance, including a stack-based TermJoin algorithm for efficiently scoring composite elements. We report results from an extensive experimental evaluation, which show, among other things, that the new TermJoin access method outperforms a direct implementation of the same functionality using standard operators by a large factor.