One-pass data mining algorithms in a DBMS with UDFs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A data mining system based on SQL queries and UDFs for relational databases
Proceedings of the 20th ACM international conference on Information and knowledge management
Data mining algorithms as a service in the cloud exploiting relational database systems
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Can we analyze big data inside a DBMS?
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Hi-index | 0.00 |
Computing Bayesian statistics with traditional techniques is extremely slow, specially when large data has to be exported from a relational DBMS. We propose algorithms for large scale processing of stochastic search variable selection (SSVS) for linear regression that can work entirely inside a DBMS. The traditional SSVS algorithm requires multiple scans of the input data in order to compute a regression model. Due to our optimizations, SSVS can be done in either one scan over the input table for large number of records with sufficient statistics, or one scan per iteration for high-dimensional data. We consider storage layouts which efficiently exploit DBMS parallel processing of aggregate functions. Experimental results demonstrate correctness, convergence and performance of our algorithms. Finally, the algorithms show good scalability for data with a very large number of records, or a very high number of dimensions.