On sensitivity of case-based reasoning to optimal feature subsets in business failure prediction

Authors:
Hui Li;Hai-Bin Huang;Jie Sun;Chuang Lin
Affiliations:
School of Business Administration, Zhejiang Normal University, 91 Subbox in P.O. Box 62, YingBinDaDao 688, Jinhua City 321004, Zhejiang Province, PR China;School of Business Administration, Zhejiang Normal University, 91 Subbox in P.O. Box 62, YingBinDaDao 688, Jinhua City 321004, Zhejiang Province, PR China;School of Business Administration, Zhejiang Normal University, 91 Subbox in P.O. Box 62, YingBinDaDao 688, Jinhua City 321004, Zhejiang Province, PR China;School of Software, Dalian University of Technology, Dalian City 116020, Liaoning Province, PR China
Venue:
Expert Systems with Applications: An International Journal
Year:
2010

Citing 28
Cited 3

Instance-Based Learning Algorithms

Machine Learning
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
Bankruptcy prediction using neural networks

Decision Support Systems - Special issue on neural networks for decision support
Case-based reasoning: foundational issues, methodological variations, and system approaches

AI Communications
Self organizing neural networks for financial diagnosis

Decision Support Systems
Hybrid neural network models for bankruptcy predictions

Decision Support Systems
Adaptation-guided retrieval: questioning the similarity assumption in reasoning

Artificial Intelligence
Applying case-based reasoning: techniques for enterprise systems

Applying case-based reasoning: techniques for enterprise systems
Dynamic Memory: A Theory of Reminding and Learning in Computers and People

Dynamic Memory: A Theory of Reminding and Learning in Computers and People
An introduction to variable and feature selection

The Journal of Machine Learning Research
Case Generation Using Rough Sets with Fuzzy Representation

IEEE Transactions on Knowledge and Data Engineering
Predicting corporate financial distress based on integration of support vector machine and logistic regression

Expert Systems with Applications: An International Journal
Mining competent case bases for case-based reasoning

Artificial Intelligence
Short communication: Data mining method for listed companies' financial distress prediction

Knowledge-Based Systems
Listed companies' financial distress prediction based on weighted majority voting combination of multiple classifiers

Expert Systems with Applications: An International Journal
Threshold accepting trained principal component neural network and feature subset selection: Application to bankruptcy prediction in banks

Applied Soft Computing
An integrative model with subject weight based on neural network learning for bankruptcy prediction

Expert Systems with Applications: An International Journal
Financial distress prediction based on OR-CBR in the principle of k-nearest neighbors

Expert Systems with Applications: An International Journal
Using neural networks and data mining techniques for the financial distress prediction model

Expert Systems with Applications: An International Journal
Developing a business failure prediction model via RST, GRA and CBR

Expert Systems with Applications: An International Journal
Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach

Applied Soft Computing
Differential evolution trained wavelet neural networks: Application to bankruptcy prediction in banks

Expert Systems with Applications: An International Journal
A cross model study of corporate financial distress prediction in Taiwan: Multiple discriminant analysis, logit, probit and neural networks models

Neurocomputing
Comparing four bankruptcy prediction models: Logit, quadratic interval logit, neural and fuzzy neural networks

Expert Systems with Applications: An International Journal
Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters

Expert Systems with Applications: An International Journal
Bayesian network classifiers versus selective k-NN classifier

Pattern Recognition
Financial distress prediction based on similarity weighted voting CBR

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
An application of support vector machine to companies' financial distress prediction

MDAI'06 Proceedings of the Third international conference on Modeling Decisions for Artificial Intelligence

A hybrid device for the solution of sampling bias problems in the forecasting of firms' bankruptcy

Expert Systems with Applications: An International Journal
Score-based methods for learning Markov boundaries by searching in constrained spaces

Data Mining and Knowledge Discovery
Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches

Knowledge-Based Systems

Quantified Score

Hi-index	12.05

Visualization

Abstract

Case-based reasoning (CBR) was firstly introduced into the area of business failure prediction (BFP) in 1996. The conclusion drawn out in its first application in this area is that CBR is not more applicable than multiple discriminant analysis (MDA) and Logit. On the contrary, there are some arguments which claim that CBR with k-nearest neighbor (k-NN) as its heart is not surely outranked by those machine learning techniques. In this research, we attempt to investigate whether or not CBR is sensitive to the so-called optimal feature subsets in BFP, since feature subset is an important factor that accounts for CBR's performance. When CBR is used to solve such classification problem, the retrieval process of its life-cycle is mainly used. We use the classical Euclidean metric technique to calculate case similarity. Empirical data two years prior to failure are collected from Shanghai Stock Exchange and Shenzhen Stock Exchange in China. Four filters, i.e. MDA stepwise method, Logit stepwise method, One-way ANOVA, independent-samples t-test, and the wrapper approach of genetic algorithm are employed to generate five optimal feature subsets after data normalization. Thirty-times hold-out method is used as assessment of predictive performances by combining leave-one-out cross-validation and hold-out method. The two statistical baseline models, i.e. MDA and Logit, and the new model of support vector machine are employed as comparative models. Empirical results indicate that CBR is truly sensitive to optimal feature subsets with data for medium-term BFP. The stepwise method of MDA, a filter approach, is the first choice for CBR to select optimal feature subsets, followed by the stepwise method of Logit and the wrapper. The two filter approaches of ANOVA and t-test are the fourth choice. If MDA stepwise method is employed to select optimal feature subset for the CBR system, there are no significant difference on predictive performance of medium-term BFP between CBR and the other three models, i.e. MDA, Logit, SVM. On the contrary, CBR is outperformed by the three models at the significant level of 1%, if ANOVA or t-test is used as feature selection method for CBR.