Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
BOAT—optimistic decision tree construction
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
NonStop SQL/MX primitives for knowledge discovery
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SQLEM: fast clustering in SQL using the EM algorithm
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SQL database primitives for decision tree classifiers
Proceedings of the tenth international conference on Information and knowledge management
FREM: fast and robust EM clustering for large data sets
Proceedings of the eleventh international conference on Information and knowledge management
Optimizing Main-Memory Join on Modern Hardware
IEEE Transactions on Knowledge and Data Engineering
Integrating Data Mining with SQL Databases: OLE DB for Data Mining
Proceedings of the 17th International Conference on Data Engineering
Spreadsheets in RDBMS for OLAP
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Vertical and horizontal percentage aggregations
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Horizontal aggregations for building tabular data sets
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Programming the K-means clustering algorithm in SQL
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Fundamentals of Database Systems, Fourth Edition
Fundamentals of Database Systems, Fourth Edition
Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
Microarray data analysis with PCA in a DBMS
Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
Efficient computation of PCA with SVD in SQL
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
Extend UDF Technology for Integrated Analytics
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Efficiently support MapReduce-like computation models inside parallel DBMS
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Fast UDFs to compute sufficient statistics on large data sets exploiting caching and sampling
Data & Knowledge Engineering
Extend core UDF framework for GPU-enabled analytical query evaluation
Proceedings of the 15th Symposium on International Database Engineering & Applications
Dynamic optimization of generalized SQL queries with horizontal aggregations
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
In general, a relational DBMS provides limited capabilities to perform multidimensional statistical analysis, which requires manipulating vectors and matrices. In this work, we study how to extend a DBMS with basic vector and matrix operators by programming User-Defined Functions (UDFs). We carefully analyze UDF features and limitations to implement vector and matrix operations commonly used in statistics, machine learning and data mining, paying attention to DBMS, operating system and computer architecture constraints. UDFs represent a C programming interface that allows the definition of scalar and aggregate functions that can be used in SQL. UDFs have several advantages and limitations. A UDF allows fast evaluation of arithmetic expressions, memory manipulation, using multidimensional arrays and exploiting all C language control statements. Nevertheless, a UDF cannot perform disk I/O, the amount of heap and stack memory that can be allocated is small and the UDF code must consider specific architecture characteristics of the DBMS. We experimentally compare UDFs and SQL with respect to performance, ease of use, flexibility and scalability. We profile UDFs based on call overhead, memory management and interleaved disk access. We show UDFs are faster than standard SQL aggregations and as fast as SQL arithmetic expressions.