What should a database know?

  • Authors:
  • Raymond Reiter

  • Affiliations:
  • Raymond Reiter, Department of Computer Science, University of Toronto, Toronto, Ontario, M5S 1A4, Canada and Canadian Institute for Advanced Research

  • Venue:
  • Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
  • Year:
  • 1988

Quantified Score

Hi-index 0.00

Visualization

Abstract

The by now conventional perspective on databases, especially deductive databases, is that they are sets of first order sentences. As such, they can be said to be claims about the truths of some external world, the database is a symbolic representation of that world.While agreeing with this account of what a database is, I disagree with how, both in theory and practice, a database is used, specifically how it is queried and how its integrity is enforced.Virtually all approaches to database query evaluation treat queries as first order formulas, usually with free variables whose bindings resulting from the evaluation phase define the answers to the query. The sole exception to this is the work of Levesque (1981, 1984), who argues that queries should be formulas in an epistemic modal logic. Queries, in other words, should be permitted to address aspects of the external world as represented in the database, as well as aspects of the database itself i e aspects of what the database knows. To take a simple example, suppose DB = p y qQuery p (i e is p true in the external world?)Answer unknownQuery Kp (i e. do you know whether p is true in the external world?)Answer noLevesque's modal logic (called KFOPCE) distinguishes between known and unknown individuals in the database and thus accounts for “regular” database values as well as null values. For example, if KB is{Teach (John, Math100), (∃ x) Teach (x, CS100), Teach (Mary, Psych100) y Teach (Sue, Psych100)},thenQuery (∃ x)K Teach (John, x) i e is there a known course which John teaches?Answer yes-Math100Query (∃ x)K Teach (x, CS100) i e is there a known teacher for CS100?Answer NoQuery (∃ x) Teach (x, Psych100) i e does anyone teach Psych 100?Answer: Yes - Mary or SueQuery (∃ x)K Teach (x, Psych100) i e is there a known teacher of Psych100?Answer NoLevesque (1981, 1984) provides a semantics for his language KFOPCE FOPCE, is the first order language KFOPCE without the modal K Levesque proposes that a database is best viewed as a set of FOPCE sentences, and that it be queried by sentences of KFOPCE. He further provides a (noneffective) way of answering database queries.Recently I have considered the concept of a static integrity constraint in the context of Levesque's KFOPCE (Reiter 1988). The conventional view of integrity constraints is that, like the database itself, they too are first order formulas (e g Lloyd & Topor (1985), Nicolas & Yazdanian (1978), Reiter (1984)). There are two definitions in the literature of a deductive database KB satisfying an integrity constraint IC.Definition 1 Consistency (e.g. Kowalski (1978), Sadri and Kowalski (1987)) KB satisfies IC if f KB + IC is satisfiableDefinition 2 Entailment (e g Lloyd and Topor (1985), Reiter (1984)) KB satisfies IC if f KB @@@@ ICAlas, neither definition seems correct. Consider a constraint requiring that employees have social security numbers (Vx) emp (x) ⊃ (∃ y) ss# (x y) (1)1 Suppose KB = {emp (Mary)} Then KB + IC is satisfiable. But intuitively, we want the constraint to require KB to contain a ss# entry for Mary, so we want IC to be violated. Thus Definition 1 does not capture our intuitions.2 Suppose KB = { } Intuitively, this should satisfy IC, but KB @@@@ IC. So Definition 2 is inappropriate.An alternative definition comes to mind when one sees that constraints like (1) intuitively are interpreted as statements not about the world but about the contents of the database, or about what it knows. Thus, using the modal K for “knows”, (1) should be rendered by (Vx K emp (x) ⊃ (∃ y) K ss# (x y)Other Examples1 To prevent a database from simultaneously assigning the properties male and female to the same individual, use the constraint (Vx) ⌍ K (male (x) ∧female (x))2 To force a database to assign one of the properties male and female to each individual, use the constraint (Vx) K person (x) ⊃K male (x) ∨ K female (x)3 To require that known instances of the relation mother(,) have first argument a female person and a second argument a person, use the constraint (Vx, y) K mother (x,y) ⊃ K (person (x) ∧female (x) ∧person (y))4 To require that every known employee have a social security number, without necessarily knowing what that number is (so that a null value is permitted), use (Vx) K emp (x) ⊃ K (∃ y) ss# (x y)My account of integrity constraints therefore, is that rather than being first order sentences, they are sentences of Levesque's KFOPCE. Constraints are not statements about the world, but about the contents of the database. The natural definition of when a database KB satisfies a constraint IC is the following KB satisfies IC iff the answer to IC when viewed as a query to KB is “yes”.In effect, therefore, my proposal is to understand integrity constraints as formally indistinguishable from KFOPCE queries, with the proviso that for any database state the answer to these queries must be “yes”.My talk will elaborate on the above notion of deductive databases. Specifically, it will provide Levesque's semantics for the K operator, discuss query evaluation under the closed world assumption, characterize integrity checking for a natural class of integrity constraints and databases, and conclude with some open research topics.