Building statistical models and scoring with UDFs
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Fast UDFs to compute sufficient statistics on large data sets exploiting caching and sampling
Data & Knowledge Engineering
Statistical Model Computation with UDFs
IEEE Transactions on Knowledge and Data Engineering
On the Computation of Stochastic Search Variable Selection in Linear Regression with UDFs
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Hi-index | 0.00 |
Data mining research is extensive, but most work has proposed efficient algorithms, data structures and optimizations that work outside a DBMS, mostly on flat files. In contrast, we present a data mining system that can work on top of a relational DBMS based on a combination of SQL queries and User-Defined Functions (UDFs), debuking the common perception that SQL is inefficient or inadequate for data mining. We show our system can analyze large data sets significantly faster than external data mining tools. Moreover, our UDF-based algorithms can process a data set in one pass and have linear scalability.