Third generation TP monitors: a database challenge

  • Authors:
  • Umesh Dayal;Hector Garcia-Molina;Mei Hsu;Ben Kao;Ming-Chien Shan

  • Affiliations:
  • Hewlett-Packard Laboratories, Palo Alto, CA;Stanford University, Department of Computer Science, Stanford, CA;Digital Equipment Corporation, Mountain View, CA;Princeton University, Department of Computer Science, Princeton, NJ;Hewlett-Packard Laboratories, Palo Alto, CA

  • Venue:
  • SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a 1976 book, “Algorithms + Data Structures = Programs” [15], Niklaus Wirth defined programs to be algorithms and data structures. Of course, by now we know that man does not live from programs alone, and that there is a second fundamental computer science equation: “Programs + Databases = Information Systems.”Database researchers have traditionally focused on the database component of the equation, providing shared and persistent repositories for the data that programs need and produce. As a matter of fact, a lot of us have worked hard to hide or ignore the programs component. For instance, non-procedural languages like SQL and relational algebra have been the holy grail of the database field, letting us describe the data in the way we want without need to write messy programs. The magic wand of transactions makes programs that execute concurrently with our non-procedural statements suddenly disappear: these other programs appear as atomic actions that are either executed before we started looking at our data, or will be executed after we are all done with our work. The wonders of fault tolerance and automatic recovery guarantee that we never have to concern ourselves with our statements failing or being interrupted. The data we need will always be there for us, and our statements will always run to completion.Unfortunately, the real programs that operate on databases are in many cases more complex than the classical ones like “withdraw 100 dollars from my account” or “find me all my blue eyed great-grandfathers.” For one, programs may be much longer, requiring many database interactions. Furthermore, programs need to interact with other concurrent programs, getting results from and to them. They may also need to be aware of their environment, perhaps monitoring the execution of another program, or taking corrective action when some system components fail. Of course, this is not to say that transactions and non-procedural query languages have not been great contributions. In many cases, they are all that is needed to program one's application. But beyond that there are many cases when one must deal with multiple concurrent applications. Indeed, a critical problem facing complex enterprises is the automation of complex business processes. Enterprises today are drowning in an ocean of data, with a few isolated islands of automation consisting of heterogeneous databases and legacy application programs, each of which automates some point function (e.g., order entry, inventory, accounting, billing) within the enterprise. As the enterprise attempts to automate its business processes, these isolated islands have to be bridged: complex information systems must be developed that need to span many of these databases and application programs. Traditional database systems do not provide the supporting environment for this.Our programming languages colleagues have been working on the programs component of our fundamental equation, but the database component has traditionally been ignored or hidden. There has been a lot of recent interest on languages that support persistent objects, but often the goal is to make the database that holds the objects look as little as possible like a database. That is, the persistent objects are to be handled just as if they were volatile objects, even though they are not. Also, the programming languages researchers have borrowed the notions of transactions and serializable schedules to hide as much as possible concurrent execution and failures of programs. Finally, traditional programming languages (there are exceptions[4, 13]) have focused on “programming in the small,” as opposed to “programming in the large.” The goal of the former is to program single applications or to solve single problems, as opposed to programming an entire enterprise and all of its interacting applications.Researchers from both camps have recently been addressing both components of the “Programs + Databases” equation. For example, database researchers have been adding triggers and procedures to database objects[2], resulting in so called active databases. These are important steps in the right direction (other related steps are listed below), but still do not address the full programming in the large problem.In our opinion, the only software providers that have tackled both components of the “Programs + Databases” equation, and have a proven track record with real applications, are the Transaction Processing Monitor (TPM) builders[9].