Answering XML queries by means of data summaries

  • Authors:
  • Elena Baralis;Paolo Garza;Elisa Quintarelli;Letizia Tanca

  • Affiliations:
  • Politecnico di Torino, Torino, Italy;Politecnico di Torino, Torino, Italy;Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. We propose a summarized representation of XML data, based on the concept of instance pattern, which can both provide succinct information and be directly queried. The physical representation of instance patterns exploits itemsets or association rules to summarize the content of XML datasets. Instance patterns may be used for (possibly partially) answering queries, either when fast and approximate answers are required, or when the actual dataset is not available, for example, it is currently unreachable. Experiments on large XML documents show that instance patterns allow a significant reduction in storage space, while preserving almost entirely the completeness of the query result. Furthermore, they provide fast query answers and show good scalability on the size of the dataset, thus overcoming the document size limitation of most current XQuery engines.