Embedded predictive modeling in a parallel relational database

  • Authors:
  • A. Dorneich;R. Natarajan;E. Pednault;F. Tipu

  • Affiliations:
  • IBM Software Group, Boeblingen, Germany;IBM Thomas J. Watson Research Center, Yorktown Heights NY;IBM Thomas J. Watson Research Center, Yorktown Heights NY;IBM Thomas J. Watson Research Center, Yorktown Heights NY

  • Venue:
  • Proceedings of the 2006 ACM symposium on Applied computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A methodology for embedding predictive modeling algorithms in a commercial parallel database is described; specifically, the parallel editions of IBM DB2 Universal Database, although many aspects of the overall approach can be used with other commercial parallel databases. This parallelization approach was implemented in the Version 8.2 release of DB2 Intelligent Miner Modeling to support a new predictive modeling algorithm called Transform Regression. This database-embedded mining algorithm provides all the usual benefits, including easier integration into large enterprise applications, the ability to perform entire data mining workflows directly from an SQL-based programming interface, reduced data transfer costs between the database and the data mining application, and faster, parallel data access during query processing. However, in addition to the these benefits, a significant part of the data mining computations are also parallelized without the use of any sophisticated parallel programming constructs, or any specialized message passing and parallel synchronization libraries.