Classification of high dimensional and imbalanced hyperspectral imagery data
IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Hi-index | 0.00 |
Microarray expression data, which contain expression levels of a large number of simultaneously observed genes, have been used in many scientific research and clinical studies. Due to its high dimensionalities, selecting a small number of genes has shown to be beneficial for tasks such as building prediction models for molecular classification of cancers. Traditional gene selection methods, however, fail to take the sample distributions into consideration for gene selection. Due to the scarcity of the samples, in Biomedical research it is very common to have severely biased data distributions with one class of examples (e.g., diseased samples) significantly less than other classes (e.g., normal samples). Sample sets with biased distributions require special attention for identifying genes responsible for particular disease. In this paper, we propose three filtering techniques, Higher Weight (HW), Differential Minority Repeat (DMR) and Balanced Minority Repeat (BMR), to identify genes relevant to fatal diseases for biased microarray expression data. Experimental comparisons with the traditional ReliefF method on five microarray datasets demonstrate the effectiveness of the proposed methods in selecting informative genes from microarray expression data with biased sample distributions.