Lineage for Markovian stream event queries

  • Authors:
  • Julie Letchner;Magdalena Balazinska

  • Affiliations:
  • Microsoft;University of Washington

  • Venue:
  • Proceedings of the 10th ACM International Workshop on Data Engineering for Wireless and Mobile Access
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Imprecise, sequential data, such as location sequences inferred from RFID/GPS, are often represented as Markovian (probabilistic, temporally-correlated) streams. Event queries, which detect instances of specific patterns in these streams, have become the standard tool for analysis of these streams; however, many data mining applications require richer information such as how a pattern is matched, how long the match is, or what stream elements matched specific pattern predicates. Such queries can dramatically increase the power of applications, but they cannot be answered by existing tools. In this paper, we present novel techniques for processing the above queries on Markovian streams. Central to our approach are algorithms for computing and manipulating the lineage of Markovian stream event queries. We provide formal definitions and linear-time algorithms for computing lineage, which may be exponentially-sized in the length of the input stream. We additionally demonstrate the importance of flexible lineage projections, and provide definitions of, and two efficient algorithms for, these projections. We evaluate all algorithms on two real-world data sets (location from RFID and words from spoken audio), and demonstrate that lineage can greatly increase the analytical power of applications while incurring small processing overhead.