XG: a grid-enabled query processing engine

Authors:
Radu Sion;Ramesh Natarajan;Inderpal Narang;Thomas Phan
Affiliations:
Stony Brook University;IBM TJ Watson Research Lab;IBM Almaden Research Lab;IBM Almaden Research Lab
Venue:
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Year:
2006

Citing 2
Cited 0

Human Factors and Web Development

Human Factors and Web Development
XG: a data-driven computation grid for enterprise-scale mining

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In [12] we introduce a novel architecture for data processing, based on a functional fusion between a data and a computation layer. In this demo we show how this architecture is leveraged to offer significant speedups for data processing jobs such as data analysis and mining over large data sets. One novel contribution of our solution is its data-driven approach. The computation infrastructure is controlled from within the data layer. Grid compute job submission events are based within the query processor on the DBMS side and in effect controlled by the data processing job to be performed. This allows the early deployment of on-the-fly data aggregation techniques, minimizing the amount of data to be transfered to/from compute nodes and is in stark contrast to existing Grid solutions that interact with data layers as external (mainly) “storage” components. By integrating scheduling intelligence in the data layer itself we show that it is possible to provide a close to optimal solution to the more general grid trade-off between required data replication costs and computation speed-up benefits. We validate this in a scenario derived from a real business deployment, involving financial customer profiling using common types of data analytics.