A database interface for file update

  • Authors:
  • Serge Abiteboul;Sophie Cluet;Tova Milo

  • Affiliations:
  • -;I.N.R.I.A., 78153 Le Chesnay Cedex, France;Computer Systems Research Institute, U. of Toronto, Toronto, Canada, M5S 1A1

  • Venue:
  • SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Database systems are concerned with structured data. Unfortunately,data is still often available in an unstructured manner (e.g., infiles) even when it does have a strong internal structure (e.g.,electronic documents or programs). In a previous paper [2], wefocussed on the use of high-level query languages to access suchfiles and developed optimization techniques to do so. In thispaper, we consider how structured data stored in files can beupdated using database update languages.The interest of using database languages to manipulate files istwofold. First, it opens database systems toexternal data. This concerns data residing infiles or data transiting on communication channels and possiblycoming from other databases [2]. Secondly, it provides high levelquery/update facilities to systems that usually rely on veryprimitive linguistic support. (See [6] for recent works in thisdirection). Similar motivations appear in [4, 5, 7, 8, 11, 12, 13,14, 15, 17, 19, 20, 21]In a previous paper, we introduced the notion of structuringschemas as a mean of providing a database view on structured dataresiding in a file. A structuring schema consists of a grammartogether with semantic actions (in a database language). We alsoshowed how queries on files expressed in ahigh-level query language (O2-SQL [3]) couldbe evaluated efficiently using variations of standard databaseoptimization techniques. The problem of update was mentioned therebut remained largely unexplored. This is the topic of the presentpaper.We argue that updates on files can be expressed convenientlyusing high-level database update languages that work on thedatabase view of the file. The key problem is how to propagate anupdate specified on the database (here a view) to the file (herethe physical storage). As a first step, we propose anaive way of update propagation: the databaseview of the file is materialized; the update is performed on thedatabase; the database is "unparsed" to produce an updated file.For this, we develop an unparsing technique. Theproblems that we meet while developing this technique are relatedto the well-known view update problem. ( See, for instance [9, 10,16, 23].) The technique relies on the existence of an inversemapping from the database to the file. We show that the existenceof such an inverse mapping results from the use of restrictedstructuring schemas.The naive technique presents two majordrawbacks. It is inefficient: it entails intense data constructionand unparsing, most of which dealing with data not involved in theupdate. It may result in information loss: information in the file,that is not recorded in the database, may be lost in the process.The major contribution of this paper is a combination of techniquesthat allows to minimize both the data construction and theunparsing work. First, we briefly show how optimization techniquesfrom [2] can be used to focus on the relevant portion of thedatabase and to avoid constructing the entire database. Then weshow that for a class of structuring schemas satisfying alocality condition, it is possible to carefullycircumscribe the unparsing.Some of the results in the paper are negative. They should notcome as a surprise since we are dealing with complex theoreticalfoundations: language theory (for parsing and unparsing), andfirst-order logic (for database languages). However, we do presentpositive results for particular classes of structuring schemas. Webelieve that the restrictions imposed on these schemas are veryacceptable in practice. (For instance, all "real" examples ofstructuring schemas that we examined arelocal.)The paper is organized as follows. In Section 2, we present theupdate problem and the structuring schemas; in Section 3, a naivetechnique for update propagation and the unparsing technique.Section 4 introduces a locality condition, and presents a moreefficient technique for propagating updates in local structuringschemas. The last section is a conclusion.