Vector quantization and signal compression
Vector quantization and signal compression
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
On the use of MDL principle in gene expression prediction
EURASIP Journal on Applied Signal Processing - Nonlinear signal and image processing - part I
Fisher information and stochastic complexity
IEEE Transactions on Information Theory
The minimum description length principle in coding and modeling
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Strong optimality of the normalized ML models as universal codes and information in data
IEEE Transactions on Information Theory
Cancer classification and prediction using logistic regression with Bayesian gene selection
Journal of Biomedical Informatics - Special issue: Biomedical machine learning
An efficient normalized maximum likelihood algorithm for DNA sequence compression
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
This paper studies the problem of class discrimination based on the normalized maximum likelihood (NML) model for a nonlinear regression, where the nonlinearly transformed class labels, each taking M possible values, are assumed to be drawn from a multinomial trial process. The strength of the MDL methods in statistical inference is to find the model structure which, in this particular classification problem, amounts to finding the best set of feature genes. We first show that the minimization of the codelength of the NML model for different sets of feature genes is a tractable problem. We then extend the model for selecting the feature genes to a completely defined classifier and check its classification error in a cross-validation experiment. Also the quantization process itself involved in getting the required entries in the model, can be evaluated with the NML description length. The new classification method is applied to leukemia class discrimination based on gene expression microarray data. We find classification errors as low as 0.03% with a quadruplet of binary qnantized genes, which was top ranked by the NML description length. Such a length of the class labels, obtained with various sets of feature genes in the nonlinear regression model, allows intuitive comparisons of nested feature sets.