On the Computation of Stochastic Search Variable Selection in Linear Regression with UDFs

Authors:
Mario Navas;Carlos Ordonez;Veerabhadran Baladandayuthapani
Affiliations:
-;-;-
Venue:
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Year:
2010

Citing 0
Cited 4

One-pass data mining algorithms in a DBMS with UDFs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A data mining system based on SQL queries and UDFs for relational databases

Proceedings of the 20th ACM international conference on Information and knowledge management
Data mining algorithms as a service in the cloud exploiting relational database systems

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Can we analyze big data inside a DBMS?

Proceedings of the sixteenth international workshop on Data warehousing and OLAP

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computing Bayesian statistics with traditional techniques is extremely slow, specially when large data has to be exported from a relational DBMS. We propose algorithms for large scale processing of stochastic search variable selection (SSVS) for linear regression that can work entirely inside a DBMS. The traditional SSVS algorithm requires multiple scans of the input data in order to compute a regression model. Due to our optimizations, SSVS can be done in either one scan over the input table for large number of records with sufficient statistics, or one scan per iteration for high-dimensional data. We consider storage layouts which efficiently exploit DBMS parallel processing of aggregate functions. Experimental results demonstrate correctness, convergence and performance of our algorithms. Finally, the algorithms show good scalability for data with a very large number of records, or a very high number of dimensions.