Expert system for clustering prokaryotic species by their metabolic features

  • Authors:
  • Clara Higuera;Gonzalo Pajares;Javier Tamames;Federico MoráN

  • Affiliations:
  • Dpt. Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, avda. Complutense s/n, 28040 Madrid, Spain and Dpto. Ingeniería del So ...;Dpto. Ingeniería del Software e Inteligencia Artificial, Facultad Informática, Universidad Complutense, C/ Prof. José García Santesmases, s/n. 28040 Madrid, Spain;Centro Nacional de Biotecnología, National Research Council (CSIC), c/Darwin, 3. Cantoblanco, 28049 Madrid, Spain;Dpt. Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, avda. Complutense s/n, 28040 Madrid, Spain

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 12.05

Visualization

Abstract

Studying the communities of microbial species is highly important since many natural and artificial processes are mediated by groups of microbes rather than by single entities. One way of studying them is the search of common metabolic characteristics among microbial species, which is not only a potential measure for the differentiation and classification of closely-related organisms but also their study allows the finding of common functional properties that may describe the way of life of entire organisms or species. In this work we propose an expert system (ES), making the main contribution, to cluster a complex data set of 365 prokaryotic species by 114 metabolic features, information which may be incomplete for some species. Inspired on the human expert reasoning and based on hierarchical clustering strategies, our proposed ES estimates the optimal number of clusters adequate to divide the dataset and afterwards it starts an iterative process of clustering, based on the Self-organizing Maps (SOM) approach, where it finds relevant clusters at different steps by means of a new validity index inspired on the well-known Davies Bouldin (DB) index. In order to monitor the process and assess the behavior of the ES the partition obtained at each step is validated with the DB validity index. The resulting clusters prove that the use of metabolic features combined with the ES is able to handle a complex dataset that can help in the extraction of underlying information, gaining advantage over other existing approaches, that may relate metabolism with phenotypic, environmental or evolutionary characteristics in prokaryotic species.