Cancer classification using gene expression data
Information Systems - Special issue: Data management in bioinformatics
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Hi-index | 0.02 |
The vast amount of data on gene expression that is now available through high-throughput measurement of mRNA abundance has provided a new basis for disease diagnosis. Microarray-based classification of disease states is based on gene expression profiles of patients. A large number of methods have been proposed to identify diagnostic markers that can accurately discriminate between different classes of a disease. Using only a subset of genes in the pathway, such as so-called condition-responsive genes (CORGs), may not fully represent the two classification boundaries for Case and Control classes. Negatively correlated feature sets (NCFS) for identifying CORGs and inferring pathway activities are proposed in this study. Our two proposed methods (NCFSi and NCFS-c) achieve higher accuracy in disease classification and can identify more phenotype-correlated genes in each pathway when comparing to several existing pathway activity inference methods.