Definable relations and first-order query languages over strings

  • Authors:
  • Michael Benedikt;Leonid Libkin;Thomas Schwentick;Luc Segoufin

  • Affiliations:
  • Bell Labs, Naperville, Illinois;University of Toronto, Toronto, Ontario, Canada;University of Marburg, Marburg, Germany;INRIA-Rocquencourt, Le Chesnay Cedex, France

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

We study analogs of classical relational calculus in the context of strings. We start by studying string logics. Taking a classical model-theoretic approach, we fix a set of string operations and look at the resulting collection of definable relations. These form an algebra---a class of n-ary relations for every n, closed under projection and Boolean operations. We show that by choosing the string vocabulary carefully, we get string logics that have desirable properties: computable evaluation and normal forms. We identify five distinct models and study the differences in their model-theory and complexity of evaluation. We identify a subset of these models that have additional attractive properties, such as finite VC dimension and quantifier elimination.Once you have a logic, the addition of free predicate symbols gives you a string query language. The resulting languages have attractive closure properties from a database point of view: while SQL does not allow the full composition of string pattern-matching expressions with relational operators, these logics yield compositional query languages that can capture common string-matching queries while remaining tractable. For each of the logics studied in the first part of the article, we study properties of the corresponding query languages. We give bounds on the data complexity of queries, extend the normal form results from logics to queries, and show that the languages have corresponding algebras expressing safe queries.