A Hybrid Evolutionary Approach for the Protein Classification Problem
ICCCI '09 Proceedings of the 1st International Conference on Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems
A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
A niched genetic programming algorithm for classification rules discovery in geographic databases
SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
Two-phase prediction of protein functions from biological literature based on Gini-Index
Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
This paper focuses on hierarchical classification problems where the classes to be predicted are organized in the form of a tree. The standard top-down divide and conquer approach for hierarchical classification consists of building a hierarchy of classifiers where a classifier is built for each internal (non-leaf) node in the class tree. Each classifier discriminates only between its child classes. After the tree of classifiers is built, the system uses them to classify test examples one class level at a time, so that when the example is assigned a class at a given level, only the child classes need to be considered at the next level. This approach has the drawback that, if a test example is misclassified at a certain class level, it will be misclassified at deeper levels too. In this paper we propose hierarchical classification methods to mitigate this drawback. More precisely, we propose a method called hierarchical ensemble of hierarchical rule sets (HEHRS), where different ensembles are built at different levels in the class tree and each ensemble consists of different rule sets built from training examples at different levels of the class tree. We also use a particle swarm optimisation (PSO) algorithm to optimise the rule weights used by HEHRS to combine the predictions of different rules into a class to be assigned to a given test example. In addition, we propose a variant of a method to mitigate the aforementioned drawback of top-down classification. These three types of methods are compared against the standard top-down hierarchical classification method in six challenging bioinformatics datasets, involving the prediction of protein function. Overall HEHRS with the rule weights optimised by the PSO algorithm obtains the best predictive accuracy out of the four types of hierarchical classification method.