BI batch manager: a system for managing batch workloads on enterprise data-warehouses

Authors:
Abhay Mehta;Chetan Gupta;Umeshwar Dayal
Affiliations:
HP Labs;HP Labs;HP Labs
Venue:
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Year:
2008

Citing 22
Cited 7

Buffer management in relational database systems

ACM Transactions on Database Systems (TODS)
Priority in DBMS resource scheduling

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Managing memory for real-time queries

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Load control for locking: the “half-and-half” approach

PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The design, implementation, and evaluation of a working set dispatcher

Communications of the ACM
The working set model for program behavior

Communications of the ACM
An admission control scheme for predictable server response time for web accesses

Proceedings of the 10th international conference on World Wide Web
Understanding the Linux Kernel

Understanding the Linux Kernel
Multiclass Query Scheduling in Real-Time Database Systems

IEEE Transactions on Knowledge and Data Engineering
Adaptive Load Control in Transaction Processing Systems

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Performance Evaluation of an Adaptive and Robust Load Control Method for the Avoidance of Data-Contention Thrashing

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Managing Memory to Meet Multiclass Workload Response Time Goals

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Towards Automated Performance Tuning for Complex Workloads

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Short Theory of Multiprogramming

MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
TPF: a dynamic system thrashing protection facility

Software—Practice & Experience
WSCLOCK—a simple and effective algorithm for virtual memory management

SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
Optimal control of thrashing

SIGMETRICS '82 Proceedings of the 1982 ACM SIGMETRICS conference on Measurement and modeling of computer systems
How to Determine a Good Multi-Programming Level for External Scheduling

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
(WCS)Operating System Concepts 7th Edition Flex Format

(WCS)Operating System Concepts 7th Edition Flex Format
Self-tuning database technology and information services: from wishful thinking to viable engineering

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Working Sets Past and Present

IEEE Transactions on Software Engineering
Online response time optimization of Apache web server

IWQoS'03 Proceedings of the 11th international conference on Quality of service

Modeling and exploiting query interactions in database systems

Proceedings of the 17th ACM conference on Information and knowledge management
Predicting completion times of batch query workloads using interaction-aware models and simulation

Proceedings of the 14th International Conference on Extending Database Technology
Performance prediction for concurrent database workloads

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Interaction-aware scheduling of report-generation workloads

The VLDB Journal — The International Journal on Very Large Data Bases
Sort-aware query scheduling in database management systems

CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Performance and resource modeling in highly-concurrent OLTP workloads

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Workload management: a technology perspective with respect to self-* characteristics

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern enterprise data warehouses have complex workloads that are notoriously difficult to manage. An important problem in workload management is to run these complex workloads 'optimally'. Traditionally this problem has been studied in the OLTP (Online Transaction Processing) context where MPL (Multi Programming Level) is used as a knob to achieve optimality. However, MPL is a tricky knob in a BI (Business Intelligence) scenario, since a low MPL can easily result in underload and a high MPL can easily result in overload and 'thrashing'. In this work we present BI Batch Manager, a workload management system to run batches of queries 'optimally' on an Enterprise Data Warehouse (EDW). It is comprised of three components: an admission control component, a scheduler and an execution control component. In order to automatically avoid underload and overload, we introduce a novel execution control mechanism, PGM (Priority Gradient Multiprogramming). In PGM, a priority gradient is created for the workload, with each query running at a distinctly different priority level. We demonstrate that this stabilizes the execution of a workload across a wide operating range. We use memory as the controlling factor for our admission control policy -- admitting batches of queries such that their memory requirement equals the available memory on the system. Our scheduling policy of largest memory query as the highest priority query further stabilizes the execution. We validate our BI Batch Manager using varying workloads on a commercial, enterprise class DBMS. We show that it effectively avoids underload and overload (thrashing) and can automatically run BI workloads with 'optimal' performance.