A declarative approach to optimize bulk loading into databases

  • Authors:
  • Sihem Amer-Yahia;Sophie Cluet

  • Affiliations:
  • AT&T Labs--Research, USA;INRIA, France

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Applications, such as warehouse maintenance, need to load large data volumes regularly. The efficiency of loading depends on the resources that are available at the source and at the target systems. Our work aims to understand the performance criteria that are involved in bulk loading data into a database and to devise tailored optimization strategies.Unlike commercial systems and previous research on the same topic, our approach follows the fundamental database principle of physical-logical independence. A loading program is represented as a sequence of algebraic expressions. This abstraction enables the use of appropriate algebraic rewritings to optimize a loading program and of a cost model that takes into consideration efficiency criteria such as the processing times at the source and target systems and the bandwidth between them. A slow-loading program may be preferable if it does not slow down other applications by consuming too much memory. Thus, we view the problem of optimizing a loading program as finding a compromise between several efficiency criteria.The ability to represent loading programs in an algebra and performance criteria in a cost model has two very desirable properties: reusability and efficiency. Database programmers do not have to write loading programs by hand. In addition, tuning loading programs becomes easier since programmers have a better control on the performance criteria specified in the cost model. The algebra captures data transformations that would have been otherwise hardcoded in loading programs. Consequently, richer optimizations can be explored. Finally, our optimization techniques are not specific to one particular system. They can be used for loading data and from to any structured store (e.g., relational, structured files).We implemented our ideas in a complete environment for migrating ODBC-compliant databases into the O2 object-oriented database system. This prototype provides a declarative view language to specify loading, an interface to specify directives, such as desired database physical organization and constraints on several criteria, such as resource and bandwidth consumption, an algebraic optimizer, a code generator, and an execution environment to control failures and guarantee incremental loading. Our experiments show that a tailored optimization is necessary when loading large data volumes into a database.