Efficient sampling and handling of variance in tuning data mining models

  • Authors:
  • Patrick Koch;Wolfgang Konen

  • Affiliations:
  • Department of Computer Science, Cologne University of Applied Sciences, Gummersbach, Germany;Department of Computer Science, Cologne University of Applied Sciences, Gummersbach, Germany

  • Venue:
  • PPSN'12 Proceedings of the 12th international conference on Parallel Problem Solving from Nature - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computational Intelligence (CI) provides good and robust working solutions for global optimization. CI is especially suited for solving difficult tasks in parameter optimization when the fitness function is noisy. Such situations and fitness landscapes frequently arise in real-world applications like Data Mining (DM). Unfortunately, parameter tuning in DM is computationally expensive and CI-based methods often require lots of function evaluations until they finally converge in good solutions. Earlier studies have shown that surrogate models can lead to a decrease of real function evaluations. However, each function evaluation remains time-consuming. In this paper we investigate if and how the fitness landscape of the parameter space changes, when only fewer observations are used for the model trainings during tuning. A representative study on seven DM tasks shows that the results are nevertheless competitive. On all these tasks, a fraction of 10-15% of the training data is sufficient. With this the computation time can be reduced by a factor of 6-10.