Indexing and searching XML documents based on content and structure synopses

  • Authors:
  • Weimin He;Leonidas Fegaras;David Levine

  • Affiliations:
  • University of Texas at Arlington, CSE, Arlington, TX;University of Texas at Arlington, CSE, Arlington, TX;University of Texas at Arlington, CSE, Arlington, TX

  • Venue:
  • BNCOD'07 Proceedings of the 24th British national conference on Databases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel framework for indexing and searching schema-less XML documents based on concise summaries of their structural and textual content. Our search query language is XPath extended with full-text search. We introduce two novel data synopsis structures that correlate textual with positional information in an XML document and improves query precision. In addition, we present a two-phase containment filtering algorithm based on these synopses that improves the searching process. Our experimental evaluation shows that our data synopses indexing scheme outperforms the standard XML indexing scheme based on inverted lists; the query evaluation based on our data synopses is more accurate than related approximate approaches that do not consider positional information; our two-phase containment filtering algorithm is more efficient than a single-phase brute force algorithm.