Massively parallel in-database predictions using PMML

  • Authors:
  • Kaushik K. Das;Eugene Fratkin;Aleksander Gorajek;Konstantinos Stathatos;Maulin Gajjar

  • Affiliations:
  • EMC/Greenplum, San Mateo, CA, USA;EMC/Greenplum, San Mateo, CA, USA;EMC/Greenplum, San Mateo, CA, USA;Zementis, Inc., San Diego, CA, USA;Zementis, Inc., San Diego, CA, USA

  • Venue:
  • Proceedings of the 2011 workshop on Predictive markup language modeling
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Like all open standards, the Predictive Model Markup Language (PMML) enables interoperability and portability in the world of data mining and predictive analytics. This means that models developed in any environment and tool set can be deployed and used in a completely different system. Such a level of flexibility creates new opportunities for addressing exceedingly demanding business agility and performance requirements. One of these requirements is the urgent need to apply the power of predictive analytics to derive reliable predictions and, hence, business decisions from vast amounts of data collected by many organizations. In this paper, we discuss how PMML enables embedding advanced predictive models directly into the database or the data warehouse, along side the actual data to be scored. More importantly, we show how we can easily take advantage of highly parallel database architectures to efficiently derive predictions from very large volumes of data.