Effects of feature construction on classification performance: An empirical study in bank failure prediction

Authors:
Huimin Zhao;Atish P. Sinha;Wei Ge
Affiliations:
Sheldon B. Lubar School of Business, University of Wisconsin-Milwaukee, P.O. Box 742, Milwaukee, WI 53201-0742, USA;Sheldon B. Lubar School of Business, University of Wisconsin-Milwaukee, P.O. Box 742, Milwaukee, WI 53201-0742, USA;Sheldon B. Lubar School of Business, University of Wisconsin-Milwaukee, P.O. Box 742, Milwaukee, WI 53201-0742, USA
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 28
Cited 8

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Managerial applications of neural networks: the case of bank failure predictions

Management Science
The KDD process for extracting useful knowledge from volumes of data

Communications of the ACM
Predictive data mining: a practical guide

Predictive data mining: a practical guide
Using Feature Construction to Improve the Performance of Neural Networks

Management Science
Constructing X-of-N Attributes for Decision Tree Learning

Machine Learning
Methodological and practical aspects of data mining

Information and Management
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
Feature Generation Using General Constructor Functions

Machine Learning
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
Data Representation for Diagnostic Neural Networks

IEEE Expert: Intelligent Systems and Their Applications
Induction of Decision Trees

Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Introduction to the special issue on the fusion of domain knowledge with data for decision support

The Journal of Machine Learning Research
Bayesian Models for Early Warning of Bank Failures

Management Science
Feature selection methods involving support vector machines for prediction of insolvency in non-life insurance companies: Research Articles

International Journal of Intelligent Systems in Accounting and Finance Management
Bankruptcy prediction: the influence of the year prior to failure selected for model building and the effects in a period of economic decline: Research Articles

International Journal of Intelligent Systems in Accounting and Finance Management
Dynamics of modeling in data mining: interpretive approach to bankruptcy prediction

Journal of Management Information Systems - Special section: Data mining
Expert, linear models, and nonlinear models of expert decision making in bankruptcy prediction: a lens model analysis

Journal of Management Information Systems - Special section: Data mining
Evaluating and Tuning Predictive Data Mining Models Using Receiver Operating Characteristic Curves

Journal of Management Information Systems
A Comparison of Seven Techniques for Choosing Subsets of Pattern Recognition Properties

IEEE Transactions on Computers
Instance weighting versus threshold adjusting for cost-sensitive classification

Knowledge and Information Systems
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Firm Bankruptcy Prediction: Experimental Comparison of Isotonic Separation and Other Classification Approaches

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Fuzzy Support Vector Machine for bankruptcy prediction

Applied Soft Computing
Tuning expert systems for cost-sensitive decisions

Advances in Artificial Intelligence
A tuning method for the architecture of neural network models incorporating GAM and GA as applied to bankruptcy prediction

Expert Systems with Applications: An International Journal
Feature selection using Bayesian and multiclass Support Vector Machines approaches: Application to bank risk prediction

Expert Systems with Applications: An International Journal
Financial ratio selection for business crisis prediction

Expert Systems with Applications: An International Journal
Metafraud: a meta-learning framework for detecting financial fraud

MIS Quarterly
A survey of multiple classifier systems as hybrid systems

Information Fusion
Novel feature selection methods to financial distress prediction

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.06

Visualization

Abstract

While extensive research in data mining has been devoted to developing better classification algorithms, relatively little research has been conducted to examine the effects of feature construction, guided by domain knowledge, on classification performance. However, in many application domains, domain knowledge can be used to construct higher-level features to potentially improve performance. For example, past research and regulatory practice in early warning of bank failures has resulted in various explanatory variables, in the form of financial ratios, that are constructed based on bank accounting variables and are believed to be more effective than the original variables in identifying potential problem banks. In this study, we empirically compare the performance of two sets of classifiers for bank failure prediction, one built using raw accounting variables and the other built using constructed financial ratios. Four popular data mining methods are used to learn the classifiers: logistic regression, decision tree, neural network, and k-nearest neighbor. We evaluate the classifiers on the basis of expected misclassification cost under a wide range of possible settings. The results of the study strongly indicate that feature construction, guided by domain knowledge, significantly improves classifier performance and that the degree of improvement varies significantly across the methods.