Neural computation and self-organizing maps: an introduction
Neural computation and self-organizing maps: an introduction
Estimating DNA sequence entropy
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Self-Organizing Maps
Data Mining: Introductory and Advanced Topics
Data Mining: Introductory and Advanced Topics
Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
How to make large self-organizing maps for nonvectorial data
Neural Networks - New developments in self-organizing maps
Analysis and visualization of gene expression data using self-organizing maps
Neural Networks - New developments in self-organizing maps
Knowledge based phylogenetic classification mining
ICDM'04 Proceedings of the 4th international conference on Advances in Data Mining: applications in Image Mining, Medicine and Biotechnology, Management and Environmental Control, and Telecommunications
Hi-index | 0.00 |
This paper presents a data mining approach to estimate multispecies gene entropy by using a self-organizing map (SOM) to mine a homologous gene set. The gene distribution function for each gene in the feature space is approximated by its probability distribution in the feature space. The phylogenetic applications of the multispecies gene entropy are investigated in an example of inferring the species phylogeny of eight yeast species. It is found that genes with the nearest K-L distances to the minimum entropy gene are more likely to be phylogenetically informative. The K-L distances of genes are strongly correlated with the spectral radiuses of their identity percentage matrices. The images of identity percentage matrices of the genes with small K-L distances to the minimum entropy gene are more similar to the image of the minimum entropy gene in their frequency domains after fast Fourier transforms (FFT) than the images of those genes with large K-L distances to the minimum entropy gene. Finally, a K-L distance based gene concatenation approach under gene clustering is proposed to infer species phylogenies robustly and systematically.