Peptide programs: applying fragment programs to protein classification

Authors:
Andre O. Falcao;Daniel Faria;António Ferreira
Affiliations:
University of Lisbon, Lisbon, Portugal;University of Lisbon, Lisbon, Portugal;University of Lisbon, Lisbon, Portugal
Venue:
Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
Year:
2008

Citing 12
Cited 0

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition

Machine Learning - Special issue on applications in molecular biology
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Inductive Logic Programming: Techniques and Applications

Inductive Logic Programming: Techniques and Applications
OpenGL(R) Shading Language (2nd Edition)

OpenGL(R) Shading Language (2nd Edition)
Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition

Bioinformatics
Profile-based direct kernels for remote homology detection and fold recognition

Bioinformatics
Application of latent semantic analysis to protein remote homology detection

Bioinformatics
Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure

Bioinformatics
Ensemblator: An ensemble of classifiers for reliable classification of biological data

Pattern Recognition Letters
Multi-class Protein Classification Using Adaptive Codes

The Journal of Machine Learning Research
Brief communication: SVM-BALSA: Remote homology detection based on Bayesian sequence alignment

Computational Biology and Chemistry

Quantified Score

Hi-index	0.00

Visualization

Abstract

Functional prediction/classification of proteins is a central problem in bioinformatics. Alignment methods are a useful approach, but have limitations, which have prompted the development and use of machine learning approaches. However, traditional machine learning approaches are unable to exploit sequence data directly, and instead use derived sequence features or Kernel functions to obtain a feature space. Because theoretically all information necessary to predict a protein's structure and function is contained in its sequence, a methodology that could exploit sequence data directly could be advantageous. A novel machine learning methodology for protein classification, inspired in the concept of fragment programs, is presented. This methodology consists in assigning a minimal computer program to each of the 20 amino acids, and then representing a protein as the program resulting from applying sequentially the programs of the amino acids which compose its sequence. The basic concepts of the methodology presented (peptide programs) are discussed and a framework is proposed for their implementation, including instruction set, virtual machine, evaluation procedures and convergence methods. The methodology is tested in the binary classification of 33,500 enzymes into 182 distinct Enzyme Commission (EC) classes. The average Matthews correlation coefficient of the binary classifiers is 0.75 in training and 0.68 in validation. Overall, the results obtained demonstrate the potential of the proposed methodology, and its ability to extract knowledge from sequence data, using very few computational resources