Pattern Recognition with Neural Network in C++
Pattern Recognition with Neural Network in C++
On Clustering Validation Techniques
Journal of Intelligent Information Systems
Genomic Comparison of Bacterial Species Based on Metabolic Characteristics
IJCBS '09 Proceedings of the 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
Automatic extraction of clusters from hierarchical clustering representations
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 12.05 |
Studying the communities of microbial species is highly important since many natural and artificial processes are mediated by groups of microbes rather than by single entities. One way of studying them is the search of common metabolic characteristics among microbial species, which is not only a potential measure for the differentiation and classification of closely-related organisms but also their study allows the finding of common functional properties that may describe the way of life of entire organisms or species. In this work we propose an expert system (ES), making the main contribution, to cluster a complex data set of 365 prokaryotic species by 114 metabolic features, information which may be incomplete for some species. Inspired on the human expert reasoning and based on hierarchical clustering strategies, our proposed ES estimates the optimal number of clusters adequate to divide the dataset and afterwards it starts an iterative process of clustering, based on the Self-organizing Maps (SOM) approach, where it finds relevant clusters at different steps by means of a new validity index inspired on the well-known Davies Bouldin (DB) index. In order to monitor the process and assess the behavior of the ES the partition obtained at each step is validated with the DB validity index. The resulting clusters prove that the use of metabolic features combined with the ES is able to handle a complex dataset that can help in the extraction of underlying information, gaining advantage over other existing approaches, that may relate metabolism with phenotypic, environmental or evolutionary characteristics in prokaryotic species.