Querying continuous functions in a database system

  • Authors:
  • Arvind Thiagarajan;Samuel Madden

  • Affiliations:
  • MIT CSAIL, Cambridge, MA, USA;MIT CSAIL, Cambridge, MA, USA

  • Venue:
  • Proceedings of the 2008 ACM SIGMOD international conference on Management of data
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many scientific, financial, data mining and sensor network applications need to work with continuous, rather than discrete data e.g., temperature as a function of location, or stock prices or vehicle trajectories as a function of time. Querying raw or discrete data is unsatisfactory for these applications -- e.g., in a sensor network, it is necessary to interpolate sensor readings to predict values at locations where sensors are not deployed. In other situations, raw data can be inaccurate owing to measurement errors, and it is useful to fit continuous functions to raw data and query the functions, rather than raw data itself -- e.g., fitting a smooth curve to noisy sensor readings, or a smooth trajectory to GPS data containing gaps or outliers. Existing databases do not support storing or querying continuous functions, short of brute-force discretization of functions into a collection of tuples. We present FunctionDB, a novel database system that treats mathematical functions as first-class citizens that can be queried like traditional relations. The key contribution of FunctionDB is an efficient and accurate algebraic query processor - for the broad class of multi-variable polynomial functions, FunctionDB executes queries directly on the algebraic representation of functions without materializing them into discrete points, using symbolic operations: zero finding, variable substitution, and integration. Even when closed form solutions are intractable, FunctionDB leverages symbolic approximation operations to improve performance. We evaluate FunctionDB on real data sets from a temperature sensor network, and on traffic traces from Boston roads. We show that operating in the functional domain has substantial advantages in terms of accuracy (15-30%) and up to order of magnitude (10x-100x) performance wins over existing approaches that represent models as discrete collections of points.