Embedded predictive modeling in a parallel relational database

Authors:
A. Dorneich;R. Natarajan;E. Pednault;F. Tipu
Affiliations:
IBM Software Group, Boeblingen, Germany;IBM Thomas J. Watson Research Center, Yorktown Heights NY;IBM Thomas J. Watson Research Center, Yorktown Heights NY;IBM Thomas J. Watson Research Center, Yorktown Heights NY
Venue:
Proceedings of the 2006 ACM symposium on Applied computing
Year:
2006

Citing 13
Cited 3

On parallel processing of aggregate and scalar functions in object-relational DBMS

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Business applications of data mining

Communications of the ACM - Evolving data mining into solutions for insights
Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications

Data Mining and Knowledge Discovery
Stochastic gradient boosting

Computational Statistics & Data Analysis - Nonlinear methods and data mining
Using SQL to Build New Aggregates and Extenders for Object- Relational Systems

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Integration of Data Mining with Database Technology

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient Mining for Association Rules with Relational Database Systems

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Scalable Mining for Classification Rules in Relational Databases

IDEAS '98 Proceedings of the 1998 International Symposium on Database Engineering & Applications
Efficient Evaluation of Queries with Mining Predicates

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A bi-level Bernoulli scheme for database sampling

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
COMBI-operator - database support for data mining applications

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A probabilistic estimation framework for predictive modeling analytics

IBM Systems Journal

A grid-based approach for enterprise-scale data mining

Future Generation Computer Systems - Special section: Data mining in grid computing environments
A grid-based approach for enterprise-scale data mining

Future Generation Computer Systems - Special section: Data mining in grid computing environments
Using Data Mining Algorithms in Web Performance Prediction

Cybernetics and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A methodology for embedding predictive modeling algorithms in a commercial parallel database is described; specifically, the parallel editions of IBM DB2 Universal Database, although many aspects of the overall approach can be used with other commercial parallel databases. This parallelization approach was implemented in the Version 8.2 release of DB2 Intelligent Miner Modeling to support a new predictive modeling algorithm called Transform Regression. This database-embedded mining algorithm provides all the usual benefits, including easier integration into large enterprise applications, the ability to perform entire data mining workflows directly from an SQL-based programming interface, reduced data transfer costs between the database and the data mining application, and faster, parallel data access during query processing. However, in addition to the these benefits, a significant part of the data mining computations are also parallelized without the use of any sophisticated parallel programming constructs, or any specialized message passing and parallel synchronization libraries.