Multispecies gene entropy estimation, a data mining approach

Authors:
Xiaoxu Han
Affiliations:
Department of Mathematics and Bioinformatics Program, Eastern Michigan University, Ypsilanti, MI
Venue:
ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Year:
2006

Citing 9
Cited 0

Neural computation and self-organizing maps: an introduction

Neural computation and self-organizing maps: an introduction
Estimating DNA sequence entropy

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Self-Organizing Maps

Self-Organizing Maps
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
How to make large self-organizing maps for nonvectorial data

Neural Networks - New developments in self-organizing maps
Analysis and visualization of gene expression data using self-organizing maps

Neural Networks - New developments in self-organizing maps
Knowledge based phylogenetic classification mining

ICDM'04 Proceedings of the 4th international conference on Advances in Data Mining: applications in Image Mining, Medicine and Biotechnology, Management and Environmental Control, and Telecommunications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a data mining approach to estimate multispecies gene entropy by using a self-organizing map (SOM) to mine a homologous gene set. The gene distribution function for each gene in the feature space is approximated by its probability distribution in the feature space. The phylogenetic applications of the multispecies gene entropy are investigated in an example of inferring the species phylogeny of eight yeast species. It is found that genes with the nearest K-L distances to the minimum entropy gene are more likely to be phylogenetically informative. The K-L distances of genes are strongly correlated with the spectral radiuses of their identity percentage matrices. The images of identity percentage matrices of the genes with small K-L distances to the minimum entropy gene are more similar to the image of the minimum entropy gene in their frequency domains after fast Fourier transforms (FFT) than the images of those genes with large K-L distances to the minimum entropy gene. Finally, a K-L distance based gene concatenation approach under gene clustering is proposed to infer species phylogenies robustly and systematically.