BI batch manager: a system for managing batch workloads on enterprise data-warehouses

  • Authors:
  • Abhay Mehta;Chetan Gupta;Umeshwar Dayal

  • Affiliations:
  • HP Labs;HP Labs;HP Labs

  • Venue:
  • EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern enterprise data warehouses have complex workloads that are notoriously difficult to manage. An important problem in workload management is to run these complex workloads 'optimally'. Traditionally this problem has been studied in the OLTP (Online Transaction Processing) context where MPL (Multi Programming Level) is used as a knob to achieve optimality. However, MPL is a tricky knob in a BI (Business Intelligence) scenario, since a low MPL can easily result in underload and a high MPL can easily result in overload and 'thrashing'. In this work we present BI Batch Manager, a workload management system to run batches of queries 'optimally' on an Enterprise Data Warehouse (EDW). It is comprised of three components: an admission control component, a scheduler and an execution control component. In order to automatically avoid underload and overload, we introduce a novel execution control mechanism, PGM (Priority Gradient Multiprogramming). In PGM, a priority gradient is created for the workload, with each query running at a distinctly different priority level. We demonstrate that this stabilizes the execution of a workload across a wide operating range. We use memory as the controlling factor for our admission control policy -- admitting batches of queries such that their memory requirement equals the available memory on the system. Our scheduling policy of largest memory query as the highest priority query further stabilizes the execution. We validate our BI Batch Manager using varying workloads on a commercial, enterprise class DBMS. We show that it effectively avoids underload and overload (thrashing) and can automatically run BI workloads with 'optimal' performance.