The future of heterogeneous databases

  • Authors:
  • W. Litwin

  • Affiliations:
  • INRIA 78153, Le Chesnay, France

  • Venue:
  • ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
  • Year:
  • 1987

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since its introduction two decades ago, the concept of a database system had a double goal. On the one hand, it was intended to provide a homogeneous picture of the data relevant to an enterprise as a whole. Through this picture, different parts of the data, previously residing in independently designed files and representing overlapping heterogeneous information, were supposed to be integrated in a one database system. The envisioned mechanism for this homogenizing was the concept of a global (unique) conceptual schema. This schema was supposed to be defined by a dedicated authority, the database administrator (DBA). On the other hand, the DBA was supposed to define views, which would present the users with heterogeneous data. Views were intended to provide to users the illusion of names, data structures and value types different from the unified ones, as required by particular needs. The whole idea is beautifully illustrated on the cover of /ULL83/. The DBA is the cook with ingredients in his hand. The data is the roast chicken, while the views are various smells of the chicken.While the intention to support heterogeneous data was present from the beginning in the concept, its implementation was slow to follow. One must indeed consider many issues. We can have heterogeneity at the system level, between data models and the semantic heterogeneity of data within the same model i.e. inconsistency in a schema definition, in naming or in values. While centralized database systems proved to be a reasonable tool for homogenizing issues, they performed poorly with the heterogeneity. In particular, the theoretical work has shown that much less can be done with views than with actual data, especially in the presence of updates (after all, it is easier to bite a chicken than a smell). Also, the function of the DBA became very difficult as we started moving towards Very Large Databases. The conceptual schema was designed frequently as to favor some users versus others with regard to logical design and performance or even worth sometimes no user had the optimal design. In general, the goal of a single VLDB for a whole enterprise stayed a dream.To enhance performance of a VLDB, the idea of a distributed database system came out about a decade ago. Better performance should result from the physical distribution of the logically centralized database on more than one site. However, people working on this idea progressively observed that it should rather be oriented towards a new goal that is a cooperation between independently designed databases. These databases could be heterogeneous as responding each to needs of a particular class of users. Some researchers came furthermore to the conclusion that a single global schema over all such data like in a classical database will usually be a dream. Most known corresponding work concerns the multidatabase systems /LIT82/, federated databases /HEI85/, open systems /HEW86/ and interoperable database systems /INT87/. All studies follow the common principle : databases should be able to exchange data and to be manipulable together, without being totally integrated. To achieve this new goal, researches investigated new functions for database languages /LIT86/, and tools for the integrated definition of views of subcollections of accessible databases /DAY85/, /NAV86/,…The work has shown that a new, and what we believe the right, approach to the heterogeneous database system design has emerged /DAI87/. This approach is as follows:with respect to the system level one should stick to the OSI model standard protocols,.…for multidatabase manipulations and generally for the interoperability at the database level, heterogeneous data models should be translated into a standard (canonical) data model. This should be done through local translators (gateways in IGRES/STAR terminology).presently, the standard model should be the relational one. It should use ISO-SQL, extended with new functions. They should in particular allow the usage of logical database names, in selecting a database or in qualifying an ambiguous This should be done through local translators (gateways in IGRES/STAR terminology).presently, the standard model should be the relational one. It should use ISO-SQL, extended with new functions. They should in particular allow the usage of logical database names, in selecting a database or in qualifying an ambiguous relation name. The database names should be distinguished from the physical site names (ex. database AIR-FRANCE at site GCAM) that should be transparent at the logical level. Further extensions to ISO-SQL should deal with request broadcasting, interdatabase data exchange, long transactions,… /LIT87/.query and data exchange protocols should be designed for the transactions at the standard model level. Sites should have adaptors (gateways) accommodating these protocols to local systems. The protocols should be extendible to incorporate extensions to the standard model, as the database technology moves fast. It should also allow to progressively federate systems other than database systems, to achieve the goal of Interoperable Information Systems.particular attention should be devoted to high speed local networks linking a large number of personal databases. They will contain the majority of databases (one usually prefers his own chicken).all this work should be an effort to refine the Presentation and Application layers of OSI model.These principles characterize the commercial systems INGRES-STAR and SYBASE, the European effort inside Distributed Aspects of Information Systems (DAISY) Working Group, the national project on Interoperable Systems in Japan, our own research work and some other efforts. If this trend continues, there is all reason to believe that truly operational and may be worldwide heterogeneous database systems will be available in five to ten years from now.