One-pass data mining algorithms in a DBMS with UDFs

  • Authors:
  • Carlos Ordonez;Sasi K. Pitchaimalai

  • Affiliations:
  • University of Houston, Houston, TX, USA;University of Houston, Houston, TX, USA

  • Venue:
  • Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining research is extensive, but most work has proposed efficient algorithms, data structures and optimizations that work outside a DBMS, mostly on flat files. In contrast, we present a data mining system that can work on top of a relational DBMS based on a combination of SQL queries and User-Defined Functions (UDFs), debuking the common perception that SQL is inefficient or inadequate for data mining. We show our system can analyze large data sets significantly faster than external data mining tools. Moreover, our UDF-based algorithms can process a data set in one pass and have linear scalability.