Protein family classification and functional annotation

Authors:
Cathy H Wu;Hongzhan Huang;Lai-Su L Yeh;Winona C Barker
Affiliations:
Georgetown University Medical Center and National Biomedical Research Foundation, 3900 Reservoir Road, NW, Box 571455, Washington, DC 20057-1455, USA;Georgetown University Medical Center and National Biomedical Research Foundation, 3900 Reservoir Road, NW, Box 571455, Washington, DC 20057-1455, USA;Georgetown University Medical Center and National Biomedical Research Foundation, 3900 Reservoir Road, NW, Box 571455, Washington, DC 20057-1455, USA;Georgetown University Medical Center and National Biomedical Research Foundation, 3900 Reservoir Road, NW, Box 571455, Washington, DC 20057-1455, USA
Venue:
Computational Biology and Chemistry
Year:
2003

Citing 0
Cited 9

Prediction of Protein Function Using Signal Processing of Biochemical Properties

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
BIO-AJAX: an extensible framework for biological data cleaning

ACM SIGMOD Record
Design and evaluation of CATPA: curation and alignment tool for protein analysis

Proceedings of the 2005 ACM symposium on Applied computing
Exploiting homogeneity in protein sequence clusters for construction of protein family hierarchies

Pattern Recognition
Database Note: iProLINK: an integrated protein resource for literature mining

Computational Biology and Chemistry
Brief communication: SVM-BALSA: Remote homology detection based on Bayesian sequence alignment

Computational Biology and Chemistry
Automated identification of protein classification and detection of annotation errors in protein databases using statistical approaches

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Protein function annotation based on ortholog clusters extracted from incomplete genomes using combinatorial optimization

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Database note: The iProClass integrated database for protein functional analysis

Computational Biology and Chemistry

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the accelerated accumulation of genomic sequence data, there is a pressing need to develop computational methods and advanced bioinformatics infrastructure for reliable and large-scale protein annotation and biological knowledge discovery. The Protein Information Resource (PIR) provides an integrated public resource of protein informatics to support genomic and proteomic research. PIR produces the Protein Sequence Database of functionally annotated protein sequences. The annotation problems are addressed by a classification-driven and rule-based method with evidence attribution, coupled with an integrated knowledge base system being developed. The approach allows sensitive identification, consistent and rich annotation, and systematic detection of annotation errors, as well as distinction of experimentally verified and computationally predicted features. The knowledge base consists of two new databases, sequence analysis tools, and graphical interfaces. PIR-NREF, a non-redundant reference database, provides a timely and comprehensive collection of all protein sequences, totaling more than 1,000,000 entries. iProClass, an integrated database of protein family, function, and structure information, provides extensive value-added features for about 830,000 proteins with rich links to over 50 molecular databases. This paper describes our approach to protein functional annotation with case studies and examines common identification errors. It also illustrates that data integration in PIR supports exploration of protein relationships and may reveal protein functional associations beyond sequence homology.