E-Cube: multi-dimensional event sequence analysis using hierarchical pattern query sharing

  • Authors:
  • Mo Liu;Elke Rundensteiner;Kara Greenfield;Chetan Gupta;Song Wang;Ismail Ari;Abhay Mehta

  • Affiliations:
  • Worcester Polytechnic Institute, Worcester, MA, USA;Worcester Polytechnic Institute, Worcester, MA, USA;Worcester Polytechnic Institute, Worcester, MA, USA;Hewlett-Packard Labs, Palo Alto, CA, USA;Hewlett-Packard Labs, Palo Alto, CA, USA;Ozyegin University, Istanbul, Turkey;Hewlett-Packard Labs, Austin, TX, USA

  • Venue:
  • Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many modern applications, including online financial feeds, tag-based mass transit systems and RFID-based supply chain management systems transmit real-time data streams. There is a need for event stream processing technology to analyze this vast amount of sequential data to enable online operational decision making. Existing techniques such as traditional online analytical processing (OLAP) systems are not designed for real-time pattern-based operations, while state-of-the-art Complex Event Processing (CEP) systems designed for sequence detection do not support OLAP operations. We propose a novel E-Cube model which combines CEP and OLAP techniques for efficient multi-dimensional event pattern analysis at different abstraction levels. Our analysis of the interrelationships in both concept abstraction and pattern refinement among queries facilitates the composition of these queries into an integrated E-Cube hierarchy. Based on this E-Cube hierarchy, strategies of drill-down (refinement from abstract to more specific patterns) and of roll-up (generalization from specific to more abstract patterns) are developed for the efficient workload evaluation. Our proposed execution strategies reuse intermediate results along both the concept and the pattern refinement relationships between queries. Based on this foundation, we design a cost-driven adaptive optimizer called Chase, that exploits the above reuse strategies for optimal E-Cube hierarchy execution. Our experimental studies comparing alternate strategies on a real world financial data stream under different workload conditions demonstrate the superiority of the Chase method. In particular, our Chase execution in many cases performs ten fold faster than the state-of-the art strategy for real stock market query workloads.