Knowledge discovery through SysFor: a systematically developed forest of multiple decision trees

Authors:
Zahidul Islam;Helen Giggins
Affiliations:
Charles Sturt University, NSW, Australia;Newcastle University, Callaghan, Australia
Venue:
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Year:
2011

Citing 7
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining: concepts and techniques

Data mining: concepts and techniques
Ensembles of Cascading Trees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A comparative study of classification methods for microarray data analysis

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
A maximally diversified multiple decision tree algorithm for microarray data classification

WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research
Upper entropy of credal sets. Applications to credal classification

International Journal of Approximate Reasoning

Evaluating the performance of several data mining methods for predicting irrigation water requirement

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision tree based classification algorithms like C4.5 and Explore build a single tree from a data set. The two main purposes of building a decision tree are to extract various patterns/logic-rules existing in a data set, and to predict the class attribute value of an unlabeled record. Sometimes a set of decision trees, rather than just a single tree, is also generated from a data set. A set of multiple trees, when used wisely, typically have better prediction accuracy on unlabeled records. Existing multiple tree techniques are catered for high dimensional data sets and therefore unable to build many trees from low dimensional data sets. In this paper we present a novel technique called Sys-For that can build many trees even from a low dimensional data set. Another strength of the technique is that instead of building multiple trees using any attribute (good or bad) it uses only those attributes that have high classification capabilities. We also present two novel voting techniques in order to predict the class value of an unlabeled record through the collective use of multiple trees. Experimental results demonstrate that SysFor is suitable for multiple pattern extraction and knowledge discovery from both low dimensional and high dimensional data sets by building a number of good quality decision trees. Moreover, it also has prediction accuracy higher than the accuracy of several existing techniques that have previously been shown as having high performance.