Deciding implication for functional dependencies in complex-value databases

  • Authors:
  • Sven Hartmann;Sebastian Link

  • Affiliations:
  • Department of Information Systems, Information Science Research Centre, Massey University, Palmerston North, New Zealand;Department of Information Systems, Information Science Research Centre, Massey University, Palmerston North, New Zealand

  • Venue:
  • Theoretical Computer Science - Logic, language, information and computation
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern applications increasingly require the storage of data beyond relational structure. The challenge of providing well-founded data models that can handle complex objects such as lists, sets, multisets, unions and references has not been met yet in a completely satisfactory way. The success of such data models will greatly depend on the existence of automated database design techniques that generalise achievements from relational databases. In this paper, we study the implication problem of functional dependencies (FDs) in the presence of records, sets, multisets and lists. Database schemata are defined as nested attributes, database instances as nested relations and FDs are defined in terms of subattributes of the database schema. The expressiveness of FDs deviates fundamentally from previous approaches in different data models including the nested relational data model and XML.The implication problem is to decide whether for an arbitrary database schema, and an arbitrary set Σ ∪ {σ} of FDs defined on that schema, every database instance that satisfies all FDs in Σ also satisfies σ. The difficulty in generalising the solution from the relational data model to the presence of sets and multisets is caused by the fact that the value on the join of subattributes is no longer determined by the values on the subattributes. Based on the notion of a unit, we propose to decompose the database schema in such a way that the closure of a set of nested attributes can be computed on the components of the schema. The implementation of the algorithm is based on a representation theorem for Brouwerian algebras. The main contribution is the proof that the algorithm works correctly and in polynomial-time in the size of the input. Defining the size of the input is not trivial since the measure should both generalise the one that is used for relational databases and do justice to the presence of sets and multisets. Our solution to the implication problem allows to solve other important problems that occur in database design. We present polynomial-time algorithms to determine non-redundant covers of sets of FDs, and to decide whether a given set of subattributes forms a superkey.