An Efficient Algorithm for Generating Generalized Decision Forests

Authors:
H. Zhao;A. P. Sinha
Affiliations:
-;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Year:
2005

Citing 0
Cited 5

Metadata and its impact on libraries: Book Reviews

Journal of the American Society for Information Science and Technology
Instance weighting versus threshold adjusting for cost-sensitive classification

Knowledge and Information Systems
Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization

Data & Knowledge Engineering
Incorporating domain knowledge into data mining classifiers: An application in indirect lending

Decision Support Systems
An improved CART decision tree for datasets with irrelevant feature

SEMCCO'11 Proceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

A shortcoming of univariate decision tree learners is that they do not learn intermediate concepts and select only one of the input features in the branching decision at each intermediate tree node. It has been empirically demonstrated that cascading other classification methods, which learn intermediate concepts, with decision tree learners can alleviate such representational bias of decision trees and potentially improve classification performance. However, a more complex model that fits training data better may not necessarily perform better on unseen data, commonly referred to as the overfitting problem. To find the most appropriate degree of such cascade generalization, a decision forest (i.e., a set of decision trees with other classification models cascaded to different degrees) needs to be generated, from which the best decision tree can then be identified. In this paper, the authors propose an efficient algorithm for generating such decision forests. The algorithm uses an extended decision tree data structure and constructs any node that is common to multiple decision trees only once. The authors have empirically evaluated the algorithm using 32 data sets for classification problems from the University of California, Irvine (UCI) machine learning repository and report on results demonstrating the efficiency of the algorithm in this paper.