When will we have true heterogeneous database systems

  • Authors:
  • A. P. Sheth

  • Affiliations:
  • Honeywell Corporate System Development Division, 1000 Boone Ave North, Golden Valley MN

  • Venue:
  • ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
  • Year:
  • 1987

Quantified Score

Hi-index 0.00

Visualization

Abstract

Systems that can minimally qualify as heterogeneous database systems already exist; however, there is a long way to go before we have true heterogeneous database systems. Before we can determine when this will happen, we must agree on what a true heterogeneous database system is.There are two dimensions of a heterogeneous database system. The first dimension is the type of heterogeneities handled by a heterogeneous database system. Figure 1 shows the major types of heterogeneities that such a system may be required to hide. Researchers and developers have been working on heterogeneities in hardware/systems, communications and operating systems for many years now. Earlier prototype and recent commercial (homogeneous) distributed DBMSs have solved the practical problems resulting from heterogeneities of these types. The more significant challenge now is to solve the problems related to the heterogeneities at the database level.A heterogeneous database system may need to be constructed from multiple existing or new centralized DBMSs. These DBMSs may use different data models (relational, hierarchical, network, etc.). Even if they use the same data model, they may be implemented differently by different vendors. The data types that need to be managed by a heterogeneous database management system may depend on the application environment and may be significantly different, warranting drastically different data management strategies. For example, a heterogeneous database management system for a factory may require integration of three types of databases: one for design and engineering data (e.g., geometric data), one for business data (e.g., factory planning and scheduling data and one for shop floor data (e.g., sensor and control data). Data management requirements may be significantly different in each of the three databases. Transactions in the business database, which has been the focus of most existing R&D in database management, typically have a short life span (a few seconds to a few minutes). Data items in the business database are also small and can be treated as atomic objects. On the other hand, data items in the technical database can be voluminous, and a transaction can last for a long time, often a few hours. Treating the data items as atomic objects often is not feasible in the technical domain. In contrast with other domains, data in the shop floor may be in very small units, may require real-time access, and may not require traditional database functionality such as concurrency control and recovery. A major challenge for building a heterogeneous database is the integration of such diverse databases.The second dimension of a heterogeneous database system is its functionality. One perspective is that a heterogeneous database system should provide at least all the functionality that is typically expected of a homogeneous distributed database system. In other words, a heterogeneous database system should allow location transparent, adhoc and multisite queries and updates, along with concurrency control for replicated data and fault tolerance. Additionally, a heterogeneous database system must provide uniform access to heterogeneous databases using data and command translation. Furthermore, environments requiring a heterogeneous database system often require more autonomy of individual databases. A major engineering problem in building a heterogeneous database system is integrating systems such that changes made to existing systems and efficiency loss are minimized.To guess when we will have true heterogeneous database systems, let us first explore the heterogeneity dimension. The current state-of-the-art is probably sufficient to hide heterogeneities at the hardware/system, communication and operating system levels. Some of the previous prototypes and several recent commercially available distributed DBMS products run on multiple systems with different hardware and operating systems (e.g., RTI's INGRES-Star and Oracle's SQL *Star). Several prototypes provide interfaces to multiple DBMSs of the same data model (e.g., Mermaid Templeton et al 83] and Multidatabase [Litwin and Abdellatif 86]). Using gateways, it is also possible to overcome the problem of multiple networks. A number of R&D efforts have focused on the problems related to multiple data models and multiple databases. A few prototype efforts (e.g., CCA's Multibase [Landers and Rosenberg 82] and Honeywell's Distributed Database Testbed System [Devor et al 82]) have shown the feasibility of creating heterogeneous database systems with heterogeneous data models and multiple DBMSs. Research in the areas of conceptual/canonical data models, database translation, schema translation and command translation have contributed greatly to this end. On the other hand, efforts to support different data types in a uniform way have been very limited (object-oriented databases hold some hope for solving this problem). Also, efforts to