Fast evaluation of iceberg pattern-based aggregate queries

Authors:
Zhian He;Petrie Wong;Ben Kao;Eric Lo;Reynold Cheng
Affiliations:
The Hong Kong Polytechnic University, Hong Kong, Hong Kong;The Hong Kong Polytechnic University, Hong Kong, Hong Kong;The University of Hong Kong, Hong Kong, Hong Kong;The Hong Kong Polytechnic University, Hong Kong, Hong Kong;The University of Hong Kong, Hong Kong, Hong Kong
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 14
Cited 0

Sequence query processing

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimization of sequence queries in database systems

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The Design and Implementation of a Sequence Database System

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
SRQL: Sorted Relational Query Language

SSDBM '98 Proceedings of the 10th International Conference on Scientific and Statistical Database Management
Temporal management of RFID data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Warehousing and Analyzing Massive RFID Data Sets

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Flowcube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On synopses for distinct-value estimation under multiset operations

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
OLAP on sequence data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hashed samples: selectivity estimators for set similarity selection queries

Proceedings of the VLDB Endowment
I/O-efficient algorithms for answering pattern-based aggregate queries in a sequence OLAP system

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Sequence OLAP (S-OLAP) system provides a platform on which pattern-based aggregate (PBA) queries on a sequence database are evaluated. In its simplest form, a PBA query consists of a pattern template T and an aggregate function F. A pattern template is a sequence of variables, each is defined over a domain. For example, the template T = (X,Y ,Y ,X) consists of two variables X and Y . Each variable is instantiated with all possible values in its corresponding domain to derive all possible patterns of the template. Sequences are grouped based on the patterns they possess. The answer to a PBA query is a sequence cuboid (s-cuboid), which is a multidimensional array of cells. Each cell is associated with a pattern instantiated from the query's pattern template. The value of each s-cuboid cell is obtained by applying the aggregate function F to the set of data sequences that belong to that cell. Since a pattern template can involve many variables and can be arbitrarily long, the induced s-cuboid for a PBA query can be huge. For most analytical tasks, however, only iceberg cells with very large aggregate values are of interest. This paper proposes an efficient approach to identify and evaluate iceberg cells of s-cuboids. Experimental results show that our algorithms are orders of magnitude faster than existing approaches.