Phylogenomic inference of protein molecular function: advances and challenges

Authors:
Kimmen Sjölander
Affiliations:
Berkeley Phylogenomics Group, Department of Bioengineering, University of California, 473 Evans Hall #1762, Berkeley, CA 94720-1762, USA
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 8

TreeSOM: cluster analysis in the self-organizing map

Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
Aligning sequences by minimum description length

EURASIP Journal on Bioinformatics and Systems Biology
Summarizing key concepts using citation sentences

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Summarizing key concepts using citation sentences

LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
A new algorithm for reconstruction of phylogenetic tree

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Generating sound workflow views for correct provenance analysis

ACM Transactions on Database Systems (TODS)
Automatic orthologous-protein-clustering from multiple complete-genomes by the best reciprocal BLAST hits

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Reliable hierarchical clustering with the self-organizing map

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Protein families evolve a multiplicity of functions through gene duplication, speciation and other processes. As a number of studies have shown, standard methods of protein function prediction produce systematic errors on these data. Phylogenomic analysis---combining phylogenetic tree construction, integration of experimental data and differentiation of orthologs and paralogs---has been proposed to address these errors and improve the accuracy of functional classification. The explicit integration of structure prediction and analysis in this framework, which we call structural phylogenomics, provides additional insights into protein superfamily evolution. Results: Results of protein functional classification using phylogenomic analysis show fewer expected false positives overall than when pairwise methods of functional classification are employed. We present an overview of the motivations and fundamental principles of phylogenomic analysis, new methods developed for the key tasks, benchmark datasets for these tasks (when available) and suggest procedures to increase accuracy. We also discuss some of the methods used in the Celera Genomics high-throughput phylogenomic classification of the human genome. Availability: Software tools from the Berkeley Phylogenomics Group are available at http://phylogenomics.berkeley.edu