Automatic Classification of Enzyme Family in Protein Annotation

Authors:
Cássia T. Santos;Ana L. Bazzan;Ney Lemke
Affiliations:
Departamento de Informática, Universidade de Évora, Portugal;Instituto de Informática / PPGC, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil C. P. 15064, 91.501-970;Dep. de Física e Biofísica, Instituto de Biociências, UNESP, Botucatu, SP, Brazil C.P. 510, 18618-000
Venue:
BSB '09 Proceedings of the 4th Brazilian Symposium on Bioinformatics: Advances in Bioinformatics and Computational Biology
Year:
2009

Citing 3
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Prediction of Enzyme Classification from Protein Sequence without the Use of Sequence Similarity

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Integrating knowledge through cooperative negotiation: a case study in bioinformatics

AIS-ADM 2005 Proceedings of the 2005 international conference on Autonomous Intelligent Systems: agents and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the tasks in genome annotation can be at least partially automated. Since this annotation is time-consuming, facilitating some parts of the process --- thus freeing the specialist to carry out more valuable tasks --- has been the motivation of many tools and annotation environments. In particular, annotation of protein function can benefit from knowledge about enzymatic processes. The use of sequence homology alone is not a good approach to derive this knowledge when there are only a few homologues of the sequence to be annotated. The alternative is to use motifs. This paper uses a symbolic machine learning approach to derive rules for the classification of enzymes according to the Enzyme Commission (EC). Our results show that, for the top class, the average global classification error is 3.13%. Our technique also produces a set of rules relating structural to functional information, which is important to understand the protein tridimensional structure and determine its biological function.