Caching Derived Data in Object-Oriented Databases, and An Intelligent System Design for Selecting Their Materialization Strategies

  • Authors:
  • William E Voss

  • Affiliations:
  • -

  • Venue:
  • Caching Derived Data in Object-Oriented Databases, and An Intelligent System Design for Selecting Their Materialization Strategies
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

I explore how an active database system could better support derived data. I present an intelligent database design which engages in database tuning by adaptively responding to usage patterns in making data materialization decisions. I also present algorithms supporting incremental recomputation of derived data, and update propagation after deriver mutation events. Besides explicitly storing and retrieving raw data, database queries often involve data which may be computed or derived from other data already stored in a database. For example, if a table of numbers is stored in a database, queries may require the table''s total. Total could be recomputed whenever needed, it could be cached, or a running-total could be maintained. If total is materialized, then to enforce correctness, all updates of the underlying table (the deriver) must trigger corresponding updates to the derived value total. I present an algorithm for safely matching derived values and derivers in behaviorally-object-oriented database systems. Using this algorithm as the foundation, I created a design which permits databases to safely cache the results of computing derived values without relaxing correctness criteria. If any of the derived value''s derivers have mutated such caches will always be invalidated before being accessed. I further extended this design to support ``delders,'''' derived values, such as running-totals, which are maintained incrementally based on ``delta'''' values. Reducing parts of a query to cache hits can be a very effective query optimization. To recompute from scratch, cache, or maintain a running-total is a database optimization question. The best answer depends in part upon usage patterns. For example, is total accessed frequently, or is the underlying table updated frequently? These usage patterns cannot necessarily be accurately predicted during application design. I describe materialization cost estimates based on having the database observe and collect data on actual usage patterns, and the actual costs observed implementing different choices. I designed an intelligent system which dynamically makes materialization choices using these cost estimates. The system is adaptive, and switches between the possible materialization choices as usage patterns change.