E-Cube: multi-dimensional event sequence analysis using hierarchical pattern query sharing

Authors:
Mo Liu;Elke Rundensteiner;Kara Greenfield;Chetan Gupta;Song Wang;Ismail Ari;Abhay Mehta
Affiliations:
Worcester Polytechnic Institute, Worcester, MA, USA;Worcester Polytechnic Institute, Worcester, MA, USA;Worcester Polytechnic Institute, Worcester, MA, USA;Hewlett-Packard Labs, Palo Alto, CA, USA;Hewlett-Packard Labs, Palo Alto, CA, USA;Ozyegin University, Istanbul, Turkey;Hewlett-Packard Labs, Austin, TX, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 25
Cited 5

Efficient algorithms for finding minimum spanning trees in undirected and directed graphs

Combinatorica
Multiple-query optimization

ACM Transactions on Database Systems (TODS)
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Efficient and extensible algorithms for multi query optimization

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Common expression analysis in database applications

SIGMOD '82 Proceedings of the 1982 ACM SIGMOD international conference on Management of data
Knowledge Discovery in Databases: An Attribute-Oriented Approach

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Composite Events for Active Databases: Semantics, Contexts and Detection

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Aggregate-Query Processing in Data Warehousing Environments

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Dynamic plan migration for continuous queries over data streams

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Semantics and evaluation techniques for window aggregates in data streams

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Adaptive load shedding for windowed stream joins

Proceedings of the 14th ACM international conference on Information and knowledge management
Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams

Distributed and Parallel Databases
Run-time operator state spilling for memory intensive long-running queries

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
High-performance complex event processing over streams

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
On-the-fly sharing for streamed aggregation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
AFilter: adaptable XML filtering with prefix-caching suffix-clustering

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
State-slice: new paradigm of multi-query optimization of window-based stream queries

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Flowcube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
OLAP on sequence data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Rule-based multi-query optimization

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Sequence Pattern Query Processing over Out-of-Order Event Streams

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
ZStream: a cost-based query processor for adaptively detecting composite events

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
CHAOS: A Data Stream Analysis Architecture for Enterprise Applications

CEC '09 Proceedings of the 2009 IEEE Conference on Commerce and Enterprise Computing
High-performance nested CEP query processing over event streams

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

Live BI: a framework for real time operations management

DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
Realtime healthcare services via nested complex event processing technology

Proceedings of the 15th International Conference on Extending Database Technology
OLAP-Like analysis of time point-based sequential data

ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
Probabilistic inference of object identifications for event stream analytics

Proceedings of the 16th International Conference on Extending Database Technology
High-performance complex event processing using continuous sliding views

Proceedings of the 16th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many modern applications, including online financial feeds, tag-based mass transit systems and RFID-based supply chain management systems transmit real-time data streams. There is a need for event stream processing technology to analyze this vast amount of sequential data to enable online operational decision making. Existing techniques such as traditional online analytical processing (OLAP) systems are not designed for real-time pattern-based operations, while state-of-the-art Complex Event Processing (CEP) systems designed for sequence detection do not support OLAP operations. We propose a novel E-Cube model which combines CEP and OLAP techniques for efficient multi-dimensional event pattern analysis at different abstraction levels. Our analysis of the interrelationships in both concept abstraction and pattern refinement among queries facilitates the composition of these queries into an integrated E-Cube hierarchy. Based on this E-Cube hierarchy, strategies of drill-down (refinement from abstract to more specific patterns) and of roll-up (generalization from specific to more abstract patterns) are developed for the efficient workload evaluation. Our proposed execution strategies reuse intermediate results along both the concept and the pattern refinement relationships between queries. Based on this foundation, we design a cost-driven adaptive optimizer called Chase, that exploits the above reuse strategies for optimal E-Cube hierarchy execution. Our experimental studies comparing alternate strategies on a real world financial data stream under different workload conditions demonstrate the superiority of the Chase method. In particular, our Chase execution in many cases performs ten fold faster than the state-of-the art strategy for real stock market query workloads.