Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
The role of domain knowledge in data mining
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
The semantic data model: a modelling mechanism for data base applications
SIGMOD '78 Proceedings of the 1978 ACM SIGMOD international conference on management of data
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
The Role of Domain Knowledge in a Large Scale Data Mining Project
SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
Business Modeling and Data Mining
Business Modeling and Data Mining
A survey of Knowledge Discovery and Data Mining process models
The Knowledge Engineering Review
Beyond accuracy: what data quality means to data consumers
Journal of Management Information Systems
Data Preparation for Data Mining Using SAS
Data Preparation for Data Mining Using SAS
Incorporating domain knowledge into data mining classifiers: An application in indirect lending
Decision Support Systems
Handbook of Statistical Analysis and Data Mining Applications
Handbook of Statistical Analysis and Data Mining Applications
Constructive induction on decision trees
IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
Focusing solutions for data mining: analytical studies and experimental results in real-world domains
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Feature representation for customer attrition risk prediction in retail banking
ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
Hi-index | 0.00 |
Classification is a widely used technique in data mining. Thereby achieving a reasonable classifier performance is an increasingly important goal. This paper aims to empirically show how classifier performance can be improved by knowledge-driven data preparation using business, data and methodological know-how. To point out the variety of knowledge-driven approaches, we firstly introduce an advanced framework that breaks down the data preparation phase to four hierarchy levels within the CRISP-DM process model. The first 3 levels reflect methodological knowledge; the last level clarifies the use of business and data know-how. Furthermore, we present insights from a case study to show the effect of variable derivation as a subtask of data preparation. The impact of 9 derivation approaches and 4 combinations of them on classifier performance is assessed on a real world dataset using decision trees and gains charts as performance measure. The results indicate that our approach improves the classifier performance.