Systems aspects of probabilistic data management

  • Authors:
  • Magdalena Balazinska;Christopher Ré;Dan Suciu

  • Affiliations:
  • University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

There has been a wide interest recently in managing probabilistic data [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]. But in order to follow the rich literature on probabilistic databases one is often required to take a detour into probability theory, correlations, conditionals, Monte Carlo simulations, error bounds, topics that have been studied extensively in several areas of Computer Science and Mathematics. Because of that, it is often difficult to get to the algorithmic and systems level aspects of probabilistic data management. In this tutorial, we will distill these aspects from the, often theory-heavy literature on probabilistic databases. We will start by describing a real application at the University of Washington, using the RFID Ecosystem; we will show how probabilities arise naturally, and why we need to cope with them. We will then describe what an implementor needs to know to process SQL queries on probabilistic databases. In the second half of the tutorial, we will discuss more advanced issues, such as event processing over probabilistic streams, and views over probabilistic data.