Evidence-Based Clustering of Reads and Taxonomic Analysis of Metagenomic Data

Authors:
Gianluigi Folino;Fabio Gori;Mike S. Jetten;Elena Marchiori
Affiliations:
ICAR-CNR, Rende, Italy;Radboud University, Nijmegen, The Netherlands;Radboud University, Nijmegen, The Netherlands;Radboud University, Nijmegen, The Netherlands
Venue:
PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Year:
2009

Citing 4
Cited 0

An Evolutionary Algorithm for Large Scale Set Covering Problems with Application to Airline Crew Scheduling

Real-World Applications of Evolutionary Computing, EvoWorkshops 2000: EvoIASP, EvoSCONDI, EvoTel, EvoSTIM, EvoROB, and EvoFlight
BLAST

BLAST
Annotation of metagenome short reads using proxygenes

Bioinformatics
Clustering Metagenome Short Reads Using Weighted Proteins

EvoBIO '09 Proceedings of the 7th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. In this paper we focus on clustering methods and their application to taxonomic analysis of metagenomic data. Clustering analysis for metagenomics amounts to group similar partial sequences, such as raw sequence reads, into clusters in order to discover information about the internal structure of the considered dataset, or the relative abundance of protein families. Different methods for clustering analysis of metagenomic datasets have been proposed. Here we focus on evidence-based methods for clustering that employ knowledge extracted from proteins identified by a BLASTx search (proxygenes). We consider two clustering algorithms introduced in previous works and a new one. We discuss advantages and drawbacks of the algorithms, and use them to perform taxonomic analysis of metagenomic data. To this aim, three real-life benchmark datasets used in previous work on metagenomic data analysis are used. Comparison of the results indicates satisfactory coherence of the taxonomies output by the three algorithms, with respect to phylogenetic content at the class level and taxonomic distribution at phylum level. In general, the experimental comparative analysis substantiates the effectiveness of evidence-based clustering methods for taxonomic analysis of metagenomic data.