Concise descriptions of subsets of structured sets

  • Authors:
  • Ken Q. Pu;Alberto O. Mendelzon

  • Affiliations:
  • University of Toronto, Canada;University of Toronto, Canada

  • Venue:
  • ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the problem of economical representation of subsets of structured sets, which are sets equipped with a set cover or a family of preorders. Given a structured set U, and a language L whose expressions define subsets of U, the problem of minimum description length in L (L-MDL) is: “given a subset V of U, find a shortest string in L that defines V.” Depending on the structure and the language, the MDL-problem is in general intractable. We study the complexity of the MDL-problem for various structures and show that certain specializations are tractable. The families of focus are hierarchy, linear order, and their multidimensional extensions; these are found in the context of statistical and OLAP databases. In the case of general OLAP databases, data organization is a mixture of multidimensionality, hierarchy, and ordering, which can also be viewed naturally as a cover-structured ordered set. Efficient algorithms are provided for the MDL-problem for hierarchical and linearly ordered structures, and we prove that the multidimensional extensions are NP-complete. Finally, we illustrate the application of the theory to summarization of large result sets and (multi) query optimization for ROLAP queries.