A data mining system based on SQL queries and UDFs for relational databases

  • Authors:
  • Carlos Ordonez;Carlos Garcia-Alvarado

  • Affiliations:
  • University of Houston, Houston, TX, USA;University of Houston, Houston, TX, USA

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most research on data mining has proposed algorithms and optimizations that work on flat files, outside a DBMS, mainly due to the following reasons. It is easier to develop efficient algorithms in a traditional programming language. The integration of data mining algorithms into a DBMS is difficult given its relational model foundation and system architecture. Moreover, SQL may be slow and cumbersome for numerical analysis computations. Therefore, data mining users commonly export data sets outside the DBMS for data mining processing, which creates a performance bottleneck and eliminates important data management capabilities such as query processing and security, among others (e.g. concurrency control and fault tolerance). With that motivation in mind, we developed a novel system based on SQL queries and User-Defined Functions (UDFs) that can directly analyze relational tables to compute statistical models, storing such models as relational tables as well. Most algorithms have been optimized to reduce the number of passes on the data set. Our system can analyze large and high dimensional data sets faster than external data mining tools.