Compressing SQL workloads

Authors:
Surajit Chaudhuri;Ashish Kumar Gupta;Vivek Narasayya
Affiliations:
One Microsoft Way, Redmond WA;University of Washington, Seattle, WA;One Microsoft Way, Redmond WA
Venue:
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Year:
2002

Citing 19
Cited 20

Translating SQL Into Relational Algebra: Optimization, Semantics, and Equivalence of SQL Queries

IEEE Transactions on Software Engineering
e-approximations with minimum packing constraint violation (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A constant-factor approximation algorithm for the k-median problem (extended abstract)

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A robust, optimization-based approach for approximate answering of aggregate queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Overcoming Limitations of Sampling for Aggregation Queries

Proceedings of the 17th International Conference on Data Engineering
ICICLES: Self-Tuning Samples for Approximate Query Answering

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Automating Statistics Management for Query Optimizers

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Dynamic Histograms: Capturing Evolving Data Sets

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
DB2 Advisor: An Optimizer Smart Enough to Recommend its own Indexes

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Indexing text data under space constraints

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Goals and benchmarks for autonomic configuration recommenders

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Using Datacube Aggregates for Approximate Querying and Deviation Detection

IEEE Transactions on Knowledge and Data Engineering
Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
Physical design refinement: The ‘merge-reduce’ approach

ACM Transactions on Database Systems (TODS)
Primitives for workload summarization and implications for SQL

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
DB2 design advisor: integrated automatic physical database design

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Self-tuning database systems: a decade of progress

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Robustness in automatic physical database design

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Compressing Very Large Database Workloads for Continuous Online Index Selection

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
A framework for testing query transformation rules

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Data mining-based materialized view and index selection in data warehouses

Journal of Intelligent Information Systems
Consistent on-line classification of dbs workload events

Proceedings of the 18th ACM conference on Information and knowledge management
A method of workload compression basing on characteristics for index selection

Proceedings of the ACM first international workshop on Data-intensive software management and mining
Tuning database configuration parameters with iTuned

Proceedings of the VLDB Endowment
Online index selection in RDBMS by evolutionary approach

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Divergent physical design tuning for replicated databases

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
An automatic physical design tool for clustered column-stores

Proceedings of the 16th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently several important relational database tasks such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these tasks are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such tasks is the size of the workload. In this paper, we present the novel problem of workload compression which helps improve the scalability of such tasks. We present a principled solution to this challenging problem. Our solution is broadly applicable to a variety of workload-driven tasks, while allowing for incorporation of task specific knowledge. We have implemented this solution and our experiments illustrate its effectiveness in the context of two workload-driven tasks: index selection and approximate query processing.