The DataPath system: a data-centric analytic processing engine for large data warehouses

Authors:
Subi Arumugam;Alin Dobra;Christopher M. Jermaine;Niketan Pansare;Luis Perez
Affiliations:
University of Florida, Gainesville, FL, USA;University of Florida, Gainesville, FL, USA;Rice University, Houston, TX, USA;Rice University, Houston, TX, USA;Rice University, Houston, TX, USA
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 19
Cited 18

Global query optimization

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Multiple-query optimization

ACM Transactions on Database Systems (TODS)
Statistical profile estimation in database systems

ACM Computing Surveys (CSUR)
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Optimizing disjunctive queries with expensive predicates

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Volcano— An Extensible and Parallel Query Evaluation System

IEEE Transactions on Knowledge and Data Engineering
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
Bypassing Joins in Disjunctive Queries

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
QPipe: a simultaneously pipelined relational query engine

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Architecture-conscious hashing

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Improving hash join performance through prefetching

ACM Transactions on Database Systems (TODS)
Linear hashing: a new tool for file and table addressing

VLDB '80 Proceedings of the sixth international conference on Very Large Data Bases - Volume 6
Cache-conscious radix-decluster projections

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Cooperative scans: dynamic bandwidth sharing in a DBMS

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A scalable, predictable join operator for highly concurrent data warehouses

Proceedings of the VLDB Endowment
Predictable performance for unpredictable workloads

Proceedings of the VLDB Endowment

Predictable performance and high query concurrency for data analytics

The VLDB Journal — The International Journal on Very Large Data Bases
Multi-core vs. I/O wall: the approaches to conquer and cooperate

WAIM'11 Proceedings of the 12th international conference on Web-age information management
The data cyclotron query processing scheme

ACM Transactions on Database Systems (TODS)
GLADE: a scalable framework for efficient analytics

ACM SIGOPS Operating Systems Review
SharedDB: killing one thousand queries with one stone

Proceedings of the VLDB Endowment
Optimizing I/O for big array analytics

Proceedings of the VLDB Endowment
GLADE: big data analytics made easy

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Data-intensive spatial filtering in large numerical simulation datasets

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Automatic knowledge base construction using probabilistic extraction, deductive reasoning, and human feedback

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Predictive analytics with surveillance big data

Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Scaling up analytical queries with column-stores

Proceedings of the Sixth International Workshop on Testing Database Systems
Data management systems on GPUs: promises and challenges

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Astronomical data processing in EXTASCID

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE

Proceedings of the Second Workshop on Data Analytics in the Cloud
Analysis and optimization for boolean expression indexing

ACM Transactions on Database Systems (TODS)
Memory footprint matters: efficient equi-join algorithms for main memory data processing

Proceedings of the 4th annual Symposium on Cloud Computing
Sharing data and work across concurrent analytical queries

Proceedings of the VLDB Endowment
A sampling algebra for aggregate estimation

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since the 1970's, database systems have been "compute-centric". When a computation needs the data, it requests the data, and the data are pulled through the system. We believe that this is problematic for two reasons. First, requests for data naturally incur high latency as the data are pulled through the memory hierarchy, and second, it makes it difficult or impossible for multiple queries or operations that are interested in the same data to amortize the bandwidth and latency costs associated with their data access. In this paper, we describe a purely-push based, research prototype database system called DataPath. DataPath is "data-centric". In DataPath, queries do not request data. Instead, data are automatically pushed onto processors, where they are then processed by any interested computation. We show experimentally on a multi-terabyte benchmark that this basic design principle makes for a very lean and fast database system.