The Data Cyclotron query processing scheme

Authors:
R. Goncalves;M. Kersten
Affiliations:
CWI, Amsterdam, The Netherlands;CWI, Amsterdam, The Netherlands
Venue:
Proceedings of the 13th International Conference on Extending Database Technology
Year:
2010

Citing 23
Cited 4

Distributed query processing

ACM Computing Surveys (CSUR)
The datacycle architecture for very high throughput database systems

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems

Communications of the ACM
Broadcast disks: data management for asymmetric communication environments

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Balancing push and pull for data broadcast

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Towards self-tuning data placement in parallel database systems

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Data Staging for On-Demand Broadcast

Proceedings of the 27th International Conference on Very Large Data Bases
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
GPUTeraSort: high performance graphics co-processor sorting for large database management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
TCP performance re-visited

ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
A high-performance computing method for data allocation in distributed database systems

The Journal of Supercomputing
Query processing methods considering the deadline of queries for database broadcasting systems

Systems and Computers in Japan
Allocating Resources to Parallel Query Plans in Data Grids

GCC '07 Proceedings of the Sixth International Conference on Grid and Cooperative Computing
DB2 design advisor: integrated automatic physical database design

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Flexible and efficient IR using array databases

The VLDB Journal — The International Journal on Very Large Data Bases
Self-tuning database systems: a decade of progress

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Breaking the memory wall in MonetDB

Communications of the ACM - Surviving the data deluge
The Database Architecture Jigsaw Puzzle

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Spinning relations: high-speed networks for distributed join processing

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Minimizing the Hidden Cost of RDMA

ICDCS '09 Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems

Predictable performance and high query concurrency for data analytics

The VLDB Journal — The International Journal on Very Large Data Bases
The data cyclotron query processing scheme

ACM Transactions on Database Systems (TODS)
The database architectures research group at CWI

ACM SIGMOD Record
Just-in-time data distribution for analytical query processing

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed database systems exploit static workload characteristics to steer data fragmentation and data allocation schemes. However, the grand challenge of distributed query processing is to come up with a self-organizing architecture, which exploits all resources to manage the hot data set, minimize query response time, and maximize throughput without global co-ordination. In this paper, we introduce the Data Cyclotron architecture which addresses the challenges using turbulent data movement through a storage ring built from distributed main memory capitalizing modern remote-DMA facilities. Queries assigned to individual nodes interact with the Data Cyclotron by picking up data fragments continuously flowing around, i.e., the hot set. Each data fragment carries a level of interest (LOI) metric, which represents the cumulative query interest as the fragment passes around the ring multiple times. A fragment with a LOI below a given threshold, inversely proportional to the ring load, is pulled out to free up resources. This threshold is dynamically adjusted in a distributed manor based on ring characteristics and query needs. It optimizes the resource utilization keeping the average data access delay low. The proposed architecture has a modest impact on existing query execution engines. This is illustrated using an extensive validated simulation study for the Data Cyclotron protocols. The results underpin their robustness in turbulent workload scenarios as well as in the TPC-H scenario. Furthermore, we think that using state-of-the-art network technology, e.g., RDMA, could lead to even more promising results. The Data Cyclotron architecture opens a new vista for modern distributed database architectures with a plethora of research challenges barely scratched upon.