Complexity Measures of Supervised Classification Problems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning
Combining Pattern Classifiers: Methods and Algorithms
Combining Pattern Classifiers: Methods and Algorithms
Biostatistical Analysis (5th Edition)
Biostatistical Analysis (5th Edition)
An Introduction to Copulas (Springer Series in Statistics)
An Introduction to Copulas (Springer Series in Statistics)
Ensemble methods for classification of patients for personalized medicine with high-dimensional data
Artificial Intelligence in Medicine
Artificial Intelligence in Medicine
Impact of error estimation on feature selection
Pattern Recognition
PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Artificial Intelligence in Medicine
Guest editorial: Computational intelligence and machine learning in bioinformatics
Artificial Intelligence in Medicine
Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets
BSB '09 Proceedings of the 4th Brazilian Symposium on Bioinformatics: Advances in Bioinformatics and Computational Biology
Ensemble gene selection by grouping for microarray data classification
Journal of Biomedical Informatics
Expert Systems with Applications: An International Journal
Ensemble gene selection for cancer classification
Pattern Recognition
Gene selection and classification using Taguchi chaotic binary particle swarm optimization
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Objective: We explore the link between dataset complexity, determining how difficult a dataset is for classification, and classification performance defined by low-variance and low-biased bolstered resubstitution error made by k-nearest neighbor classifiers. Methods and material: Gene expression based cancer classification is used as the task in this study. Six gene expression datasets containing different types of cancer constitute test data. Results: Through extensive simulation coupled with the copula method for analysis of association in bivariate data, we show that dataset complexity and bolstered resubstitution error are associated in terms of dependence. As a result, we propose a new scheme for generating ensembles of classifiers that selects subsets of features of low complexity for ensemble members, which constitutes the accurate members according to the found dependence relation. Conclusion: Experiments with six gene expression datasets demonstrate that our ensemble generating scheme based on the dependence of dataset complexity and classification error is superior to a single best classifier in the ensemble and to the traditional ensemble construction scheme that is ignorant of dataset complexity.