Self-organizing and self-correcting classifications of biological data

Authors:
George M. Garrity;Timothy G. Lilburn
Affiliations:
Department of Microbiology and Molecular Genetics, Michigan State University East Lansing, MI 48824, USA;Science Information Systems, American Type Culture Collection Manassas, VA 20110, USA
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 2

Comparison of Genomic Sequences Clustering Using Normalized Compression Distance and Evolutionary Distance

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Soft topographic maps for clustering and classifying bacteria using housekeeping genes

Advances in Artificial Neural Systems

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Rapid, automated means of organizing biological data are required if we hope to keep abreast of the flood of data emanating from sequencing, microarray and similar high-throughput analyses. Faced with the need to validate the annotation of thousands of sequences and to generate biologically meaningful classifications based on the sequence data, we turned to statistical methods in order to automate these processes. Results: An algorithm for automated classification based on evolutionary distance data was written in S. The algorithm was tested on a dataset of 1436 small subunit ribosomal RNA sequences and was able to classify the sequences according to an extant scheme, use statistical measurements of group membership to detect sequences that were misclassified within this scheme and produce a new classification. In this study, the use of the algorithm to address problems in prokaryotic taxonomy is discussed. Availability: S-Plus is available from Insightful, Inc. An S-Plus implementation of the algorithm and the associated data are available at http://taxoweb.mmg.msu.edu/datasets Contact: garrity@msu.edu