Identifying Simple Discriminatory Gene Vectors with an Information Theory Approach

Authors:
Zheng Yun;Kwoh Chee Keong
Affiliations:
Nanyang Technological University;Nanyang Technological University
Venue:
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Year:
2005

Citing 15
Cited 4

Instance-Based Learning Algorithms

Machine Learning
Elements of information theory

Elements of information theory
C4.5: programs for machine learning

C4.5: programs for machine learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Object Recognition with Informative Features and Linear Classification

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Dynamic Algorithm for Inferring Qualitative Models of Gene Regulatory Networks

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Fast Binary Feature Selection with Conditional Mutual Information

The Journal of Machine Learning Research
Data mining in bioinformatics using Weka

Bioinformatics
A Mathematical Theory of Communication

A Mathematical Theory of Communication

Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction

Computers in Biology and Medicine
Exploring essential attributes for detecting MicroRNA precursors from background sequences

VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
Generation of comprehensible hypotheses from gene expression data

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Informative MicroRNA expression patterns for cancer classification

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

In the feature selection of cancer classification problems, many existing methods consider genes individually by choosing the top genes which have the most significant signal-to-noise statistic or correlation coefficient. However the information of the class distinction provided by such genes may overlap intensively, since their gene expression patterns are similar. The redundancy of including many genes with similar gene expression patterns results in highly complex classifiers. According to the principle of Oc-camýs razor, simple models are preferable to complex ones, if they can produce comparable prediction performances to the complex ones. In this paper, we introduce a new method to learn accurate and low-complexity classifiers from gene expression profiles. In our method, we use mutual information to measure the relation between a set of genes, called gene vectors, and the class attribute of the samples. The gene vectors are in higher-dimensional spaces than individual genes, therefore, they are more diverse, or contain more information than individual genes. Hence, gene vectors are more preferable to individual genes in describing the class distinctions between samples since they contain more information about the class attribute. We validate our method on 3 gene expression profiles. By comparing our results with those from literature and other well-known classification methods, our method demonstrated better or comparable prediction performances to the existing methods, however, with lower-complexity models than existing methods.