Tuned data mining: a benchmark study on different tuners

  • Authors:
  • Wolfgang Konen;Patrick Koch;Oliver Flasch;Thomas Bartz-Beielstein;Martina Friese;Boris Naujoks

  • Affiliations:
  • Cologne University of Applied Sciences, Gummersbach, Germany;Cologne University of Applied Sciences, Gummersbach, Germany;Cologne University of Applied Sciences, Gummersbach, Germany;Cologne University of Applied Sciences, Gummersbach, Germany;Cologne University of Applied Sciences, Gummersbach, Germany;Cologne University of Applied Sciences, Gummersbach, Germany

  • Venue:
  • Proceedings of the 13th annual conference on Genetic and evolutionary computation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The complex, often redundant and noisy data in real-world data mining (DM) applications frequently lead to inferior results when out-of-the-box DM models are applied. A tuning of parameters is essential to achieve high-quality results. In this work we aim at tuning parameters of the preprocessing and the modeling phase conjointly. The framework TDM (Tuned Data Mining) was developed to facilitate the search for good parameters and the comparison of different tuners. It is shown that tuning is of great importance for high-quality results. Surrogate-model based tuning utilizing the Sequential Parameter Optimization Toolbox (SPOT) is compared with other tuners (CMA-ES, BFGS, LHD) and evidence is found that SPOT is well suited for this task. In benchmark tasks like the Data Mining Cup (DMC) tuned models achieve remarkably better ranks than their untuned counterparts.