Improving classifier performance by knowledge-driven data preparation

Authors:
Laura Welcker;Stephan Koch;Frank Dellmann
Affiliations:
Münster University of Applied Sciences, Münster, Germany;BBDO Proximity GmbH, Hamburg, Germany;Münster University of Applied Sciences, Münster, Germany
Venue:
ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
Year:
2012

Citing 16
Cited 1

Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
The role of domain knowledge in data mining

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Machine Learning
The semantic data model: a modelling mechanism for data base applications

SIGMOD '78 Proceedings of the 1978 ACM SIGMOD international conference on management of data
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
The Role of Domain Knowledge in a Large Scale Data Mining Project

SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
Business Modeling and Data Mining

Business Modeling and Data Mining
A survey of Knowledge Discovery and Data Mining process models

The Knowledge Engineering Review
Beyond accuracy: what data quality means to data consumers

Journal of Management Information Systems
Data Preparation for Data Mining Using SAS

Data Preparation for Data Mining Using SAS
Incorporating domain knowledge into data mining classifiers: An application in indirect lending

Decision Support Systems
Handbook of Statistical Analysis and Data Mining Applications

Handbook of Statistical Analysis and Data Mining Applications
Constructive induction on decision trees

IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
Focusing solutions for data mining: analytical studies and experimental results in real-world domains

Focusing solutions for data mining: analytical studies and experimental results in real-world domains
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Feature representation for customer attrition risk prediction in retail banking

ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification is a widely used technique in data mining. Thereby achieving a reasonable classifier performance is an increasingly important goal. This paper aims to empirically show how classifier performance can be improved by knowledge-driven data preparation using business, data and methodological know-how. To point out the variety of knowledge-driven approaches, we firstly introduce an advanced framework that breaks down the data preparation phase to four hierarchy levels within the CRISP-DM process model. The first 3 levels reflect methodological knowledge; the last level clarifies the use of business and data know-how. Furthermore, we present insights from a case study to show the effect of variable derivation as a subtask of data preparation. The impact of 9 derivation approaches and 4 combinations of them on classifier performance is assessed on a real world dataset using decision trees and gains charts as performance measure. The results indicate that our approach improves the classifier performance.