Index tuning for parameterized streaming groupby queries

Authors:
Luping Ding;Elke A. Rundensteiner
Affiliations:
Worcester Polytechnic Institute, Worcester, MA;Worcester Polytechnic Institute, Worcester, MA
Venue:
SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Year:
2008

Citing 20
Cited 1

Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Dataflow query execution in a parallel main-memory environment

Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Optimal partial-match retrieval when fields are independently specified

ACM Transactions on Database Systems (TODS)
Index selection in a self-adaptive data base management system

SIGMOD '76 Proceedings of the 1976 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Adaptive and Automated Index Selection in RDBMS

EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
The MD-join: An Operator for Complex OLAP

Proceedings of the 17th International Conference on Data Engineering
Automated Selection of Materialized Views and Indexes in SQL Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Aggregate-Query Processing in Data Warehousing Environments

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
PSoup: a system for streaming queries over streaming data

The VLDB Journal — The International Journal on Very Large Data Bases
Dynamic plan migration for continuous queries over data streams

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Index Selection for Databases: A Hardness Study and a Principled Heuristic Solution

IEEE Transactions on Knowledge and Data Engineering
Real-Time Business Intelligence in Multi-Agent Adaptive Supply Networks

EEE '05 Proceedings of the 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05) on e-Technology, e-Commerce and e-Service
No pane, no gain: efficient evaluation of sliding-window aggregates over data streams

ACM SIGMOD Record
Multiple aggregations over data streams

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On-the-fly sharing for streamed aggregation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Resource sharing in continuous sliding-window aggregates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
CAPE: continuous query engine with heterogeneous-grained adaptivity

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Optimizing adaptive multi-route query processing via time-partitioned indices

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similar groupby queries are common in many stream processing applications. We propose the concept of the parameterized streaming groupby query template (PSGB template) as an abstraction for representing potentially infinite number of runtime instantiated groupby queries with customized results. To handle high-speed data streams and large numbers of PSGB queries, the IMP index is proposed for organizing the quickly evolving PSGB operator state to support query workloads. In this paper, we tackle the IMP index tuning problem. We propose the EPrune algorithm that is guaranteed to find the optimal IMP index configuration for a given query workload. To support frequent index tuning required for coping with dynamic stream environments, efficiency of index selection becomes more important than guaranteed optimality. To achieve this, we design a greedy index selection algorithm named RGreedy and equip it with three heuristics - OWL, PCL and Hybrid. Our experiments show that RGreedy finds the optimal IMP configuration in practically all of our extensive test cases. While EPrune takes hours to finish, RGreedy terminates within seconds.