Representation of protein-sequence information by amino acid subalphabets

Authors:
Claus A. F. Andersen;Søren Brunak
Affiliations:
Siena Biotech SpA.;Technical University of Denmark
Venue:
AI Magazine
Year:
2004

Citing 2
Cited 1

Introduction to the theory of neural computation

Introduction to the theory of neural computation
Bioinformatics: the machine learning approach

Bioinformatics: the machine learning approach

Wavelet Analysis in Current Cancer Genome Research: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Within computational biology, algorithms are constructed with the aim of extracting knowledge from biological data, in particular, data generated by the large genome projects, where gene and protein sequences are produced in high volume. In this article, we explore new ways of representing protein-sequence information, using machine learning strategies, where the primary goal is the discovery of novel powerful representations for use in AI techniques. In the case of proteins and the 20 different amino acids they typically contain, it is also a secondary goal to discover how the current selection of amino acids--which now are common in proteins--might have emerged from simpler selections, or alphabets, in use earlier during the evolution of living organisms.