An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Complexity Measures of Supervised Classification Problems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
On Data and Algorithms: Understanding Inductive Performance
Machine Learning
On Classifier Domains of Competence
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Efficient Feature Selection via Analysis of Relevance and Redundancy
The Journal of Machine Learning Research
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Pattern Classifier Design by Linear Programming
IEEE Transactions on Computers
On the Complexity of Gene Expression Classification Data Sets
HIS '08 Proceedings of the 2008 8th International Conference on Hybrid Intelligent Systems
Artificial Intelligence in Medicine
Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets
BSB '09 Proceedings of the 4th Brazilian Symposium on Bioinformatics: Advances in Bioinformatics and Computational Biology
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Empirical evaluation of ranking prediction methods for gene expression data classification
IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
On the Complexity of Gene Marker Selection
SBRN '10 Proceedings of the 2010 Eleventh Brazilian Symposium on Neural Networks
Hi-index | 0.01 |
Currently, cancer diagnosis at a molecular level has been made possible through the analysis of gene expression data. More specifically, one usually uses machine learning (ML) techniques to build, from cancer gene expression data, automatic diagnosis models (classifiers). Cancer gene expression data often present some characteristics that can have a negative impact in the generalization ability of the classifiers generated. Some of these properties are data sparsity and an unbalanced class distribution. We investigate the results of a set of indices able to extract the intrinsic complexity information from the data. Such measures can be used to analyze, among other things, which particular characteristics of cancer gene expression data mostly impact the prediction ability of support vector machine classifiers. In this context, we also show that, by applying a proper feature selection procedure to the data, one can reduce the influence of those characteristics in the error rates of the classifiers induced.