Multi-structural databases

  • Authors:
  • Ronald Fagin;R. Guha;Ravi Kumar;Jasmine Novak;D. Sivakumar;Andrew Tomkins

  • Affiliations:
  • IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA

  • Venue:
  • Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce the Multi-Structural Database, a new dataframework to support efficient analysis of large, complex datasets. An instance of the model consists of a set of data objects,together with a schema that specifies segmentations of the set ofdata objects according to multiple distinct criteria (e.g., into ataxonomy based on a hierarchical attribute). Within this model, wedevelop a rich set of analytical operations and design highlyefficient algorithms for these operations. Our operations areformulated as optimization problems, and allow the user to analyzethe underlying data in terms of the allowed segmentations.Our algorithms and results extend those of Fagin et al. [8] whostudied composition of mappings given by several kinds ofconstraints. In particular, they proved that full source-to-targettuple-generating dependencies (tgds) are closed under composition,but embedded source-to-target tgds are not. They introduced a classof second-order constraints, SO tgds, that isclosed under composition and has desirable properties for dataexchange.We study constraints that need not be source-to-target and weconcentrate on obtaining (first-order) embedded dependencies. Aspart of this study, we also consider full dependencies andsecond-order constraints that arise from Skolemizing embeddeddependencies. For each of the three classes of mappings that westudy, we provide (a) an algorithm that attempts to compute thecomposition and (b) sufficient conditions on the input mappingsthat guarantee that the algorithm will succeed.In addition, we give several negative results. In particular, weshow that full dependencies are not closed under composition, andthat second-order dependencies that are not limited to besource-to-target are not closed under restricted composition.Furthermore, we show that determining whether the composition canbe given by these kinds of dependencies is undecidable.