Approximation trade-offs in a Markovian stream warehouse: An empirical study

Authors:
J. Letchner;M. Balazinska;C. Ré;M. Philipose
Affiliations:
Microsoft Corporation, Bellevue, WA, United States;University of Washington, Seattle, WA, United States;University of Wisconsin, Madison, WI, United States;Intel Research, Seattle, WA, United States
Venue:
Information Systems
Year:
2014

Citing 30
Cited 0

Learning in graphical models

Learning in graphical models
Probabilistic Networks and Expert Systems

Probabilistic Networks and Expert Systems
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
A Metric for Distributions with Applications to Image Databases

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Fine-Grained Activity Recognition by Aggregating Abstract Object Usage

ISWC '05 Proceedings of the Ninth IEEE International Symposium on Wearable Computers
MauveDB: supporting model-based user views in database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Fast particle smoothing: if I had a million particles

ICML '06 Proceedings of the 23rd international conference on Machine learning
Flowcube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Approximate encoding for direct access and query processing over compressed bitmaps

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
OLAP over uncertain and imprecise data

The VLDB Journal — The International Journal on Very Large Data Bases
Learning and inferring transportation routines

Artificial Intelligence
Estimating statistical aggregates on probabilistic data streams

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient pattern matching over event streams

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient storage scheme and query processing for supply chain management using RFID

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
OLAP on sequence data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Event queries on correlated probabilistic streams

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
BayesStore: managing large, uncertain data repositories with probabilistic graphical models

Proceedings of the VLDB Endowment
Exploiting shared correlations in probabilistic databases

Proceedings of the VLDB Endowment
Online Filtering, Smoothing and Probabilistic Modeling of Streaming data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Access Methods for Markovian Streams

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Probabilistic Inference over RFID Streams in Mobile Environments

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Indexing correlated probabilistic databases

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Lahar: warehousing markovian streams

Lahar: warehousing markovian streams
Towards expressive publish/subscribe systems

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A tutorial on particle filters for online nonlinear/non-GaussianBayesian tracking

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large amount of the world's data is both sequential and low-level. Many applications need to query higher-level information (e.g., words and sentences) that is inferred from these low-level sequences (e.g., raw audio signals) using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level sequences that are imprecise. Once archived, these imprecise streams are difficult to query efficiently because of their rich semantics and large volumes, forcing applications to sacrifice either performance or accuracy. There exists little work, however, that characterizes this trade-off space and helps applications make an appropriate choice. In this paper, we study the effects - on both efficiency and accuracy - of various stream approximations such as ignoring correlations, ignoring low-probability states, or retaining only the single most likely sequence of events. Through experiments on a real-world RFID data set, we identify conditions under which various approximations can improve performance by several orders of magnitude, with only minimal effects on query results. We also identify cases when the full rich semantics are necessary. This study is the first to evaluate the cost vs. quality trade-off of imprecise stream models. We perform this study using Lahar, a prototype Markovian stream warehouse. A secondary contribution of this paper is the development of query semantics and algorithms for processing aggregation queries on the output of pattern queries-we develop these queries in order to more fully understand the effects of approximation on a wider set of imprecise stream queries.