Improving classifier performance by knowledge-driven data preparation

  • Authors:
  • Laura Welcker;Stephan Koch;Frank Dellmann

  • Affiliations:
  • Münster University of Applied Sciences, Münster, Germany;BBDO Proximity GmbH, Hamburg, Germany;Münster University of Applied Sciences, Münster, Germany

  • Venue:
  • ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification is a widely used technique in data mining. Thereby achieving a reasonable classifier performance is an increasingly important goal. This paper aims to empirically show how classifier performance can be improved by knowledge-driven data preparation using business, data and methodological know-how. To point out the variety of knowledge-driven approaches, we firstly introduce an advanced framework that breaks down the data preparation phase to four hierarchy levels within the CRISP-DM process model. The first 3 levels reflect methodological knowledge; the last level clarifies the use of business and data know-how. Furthermore, we present insights from a case study to show the effect of variable derivation as a subtask of data preparation. The impact of 9 derivation approaches and 4 combinations of them on classifier performance is assessed on a real world dataset using decision trees and gains charts as performance measure. The results indicate that our approach improves the classifier performance.