Hybrid random subsample classifier ensemble for high dimensional data sets

Authors:
Santhosh Pathical;Gursel Serpen
Affiliations:
Electrical Engineering and Computer Science Department, University of Toledo, Toledo, OH, USA;Electrical Engineering and Computer Science Department, University of Toledo, Toledo, OH, USA
Venue:
International Journal of Hybrid Intelligent Systems
Year:
2012

Citing 34
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
A Review and Empirical Evaluation of Feature Weighting Methods for aClass of Lazy Learning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Experiments in high-dimensional text categorization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Linear-Bayes Classifier

IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
Improving Supervised Learning by Feature Decomposition

FoIKS '02 Proceedings of the Second International Symposium on Foundations of Information and Knowledge Systems
Input Decimation Ensembles: Decorrelation through Dimensionality Reduction

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Probabilistic Discriminative Kernel Classifiers for Multi-class Problems

Proceedings of the 23rd DAGM-Symposium on Pattern Recognition
Efficient C4.5

Efficient C4.5
An introduction to variable and feature selection

The Journal of Machine Learning Research
Experiments with random projections for machine learning

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Is Combining Classifiers with Stacking Better than Selecting the Best One?

Machine Learning
Core Vector Machines: Fast SVM Training on Very Large Data Sets

The Journal of Machine Learning Research
Rotation Forest: A New Classifier Ensemble Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Multi-Classifier Systems: Review and a roadmap for developers

International Journal of Hybrid Intelligent Systems
Classification by ensembles from random partitions of high-dimensional data

Computational Statistics & Data Analysis
An evolutionary algorithm for constructing a decision forest: Combining the classification of disjoints decision trees

International Journal of Intelligent Systems
On diversity and accuracy of homogeneous and heterogeneous ensembles

International Journal of Hybrid Intelligent Systems
An empirical evaluation of supervised learning in high dimensions

Proceedings of the 25th international conference on Machine learning
A Novel Ensemble Approach for Cancer Data Classification

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
Classification in Very High Dimensional Problems with Handfuls of Examples

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
An evolutionary approach for achieving scalability with general regression neural networks

Natural Computing: an international journal
A fast decision tree learning algorithm

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Feature selection based on the Shapley value

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Nearest prototype classification of noisy data

Artificial Intelligence Review
Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble

ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
Hybridization of Base Classifiers of Random Subsample Ensembles for Enhanced Performance in High Dimensional Feature Spaces

ICMLA '10 Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications
Comparative performance evaluation of global-local hybrid ensemble

International Journal of Hybrid Intelligent Systems
The curse of dimensionality in data mining and time series prediction

IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
Nonparametric multivariate density estimation: a comparative study

IEEE Transactions on Signal Processing
Supervised classification in high-dimensional space: geometrical, statistical, and asymptotical properties of multivariate data

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Discriminative components of data

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a comparative performance evaluation of a random subsample classifier ensemble with leading machine learning classifiers on high dimensional datasets. Classification performance of the hybrid random subsample ensemble is compared to those of a comprehensive set of machine learning classification algorithms through both in-house simulations and the results published by others in the literature. Performance comparison is based on prediction accuracies on six datasets from the UCI Machine Learning repository, namely Dexter, Madelon, Isolet, Multiple Features, Internet Ads, and Citeseer, with feature counts of up to 105,000. Simulation results establish the competitive performance aspect of the hybrid random subsample ensemble for high dimensional datasets. Specifically, the study findings indicate that hybrid random subsample ensembles with a subsample rate of 15% and base classifier count of 25 or more can achieve a very competitive performance on high dimensional data sets when compared to leading machine learning classifier algorithms.