Active mining discriminative gene sets

Authors:
Feng Chu;Lipo Wang
Affiliations:
College of Information Engineering, Xiangtan University, Xiangtan, Hunan, China;College of Information Engineering, Xiangtan University, Xiangtan, Hunan, China
Venue:
ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Year:
2006

Citing 10
Cited 0

Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
A Probabilistic Active Support Vector Learning Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing)

Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing)
A statistical method for identifying differential gene--gene co-expression patterns

Bioinformatics
A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments

Bioinformatics
HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data

Bioinformatics
Variational Bayes for Continuous Hidden Markov Models and Its Application to Active Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data dimensionality reduction with application to simplifying RBF network structure and improving classification performance

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching for good discriminative gene sets (DGSs) in microarray data is important for many problems, such as precise cancer diagnosis, correct treatment selection, and drug discovery. Small and good DGSs can help researchers eliminate “irrelavent” genes and focus on “critical” genes that may be used as biomarkers or that are related to the development of cancers. In addition, small DGSs will not impose demanding requirements to classifiers, e.g., high-speed CPUs, large memorys, etc. Furthermore, if the DGSs are used as diagnostic measures in the future, small DGSs will simplify the test and therefore reduce the cost. Here, we propose an algorithm of searching for DGSs, which we call active mining discriminative gene sets (AM-DGS). The searching scheme of the AM-DGS is as follows: the gene with a large t-statistic is assigned as a seed, i.e., the first feature of the DGS. We classify the samples in a data set using a support vector machine (SVM). Next, we add the gene with the greatest power to correct the misclassified samples into the DGS, that is the gene with the largest t-statistic evaluated with only the mis-classified samples is added. We keep on adding genes into the DGS according to the SVM's mis-classified data until no error appears or overfitting occurs. We tested the proposed method with the well-known leukemia data set. In this data set, our method obtained two 2-gene DGSs that achieved 94.1% testing accuracy and a 4-gene DGS that achieved 97.1% testing accuracy. This result showed that our method obtained better accuracy with much smaller DGSs compared to 3 widely used methods, i.e., T-statistics, F-statistics, and SVM-based recursive feature elimination (SVM-RFE).