The DataPath system: a data-centric analytic processing engine for large data warehouses

  • Authors:
  • Subi Arumugam;Alin Dobra;Christopher M. Jermaine;Niketan Pansare;Luis Perez

  • Affiliations:
  • University of Florida, Gainesville, FL, USA;University of Florida, Gainesville, FL, USA;Rice University, Houston, TX, USA;Rice University, Houston, TX, USA;Rice University, Houston, TX, USA

  • Venue:
  • Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since the 1970's, database systems have been "compute-centric". When a computation needs the data, it requests the data, and the data are pulled through the system. We believe that this is problematic for two reasons. First, requests for data naturally incur high latency as the data are pulled through the memory hierarchy, and second, it makes it difficult or impossible for multiple queries or operations that are interested in the same data to amortize the bandwidth and latency costs associated with their data access. In this paper, we describe a purely-push based, research prototype database system called DataPath. DataPath is "data-centric". In DataPath, queries do not request data. Instead, data are automatically pushed onto processors, where they are then processed by any interested computation. We show experimentally on a multi-terabyte benchmark that this basic design principle makes for a very lean and fast database system.