DCC '99 Proceedings of the Conference on Data Compression
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
On Compressibility of Protein Sequences
DCC '06 Proceedings of the Data Compression Conference
A Simple Statistical Algorithm for Biological Sequence Compression
DCC '07 Proceedings of the 2007 Data Compression Conference
The context-tree weighting method: basic properties
IEEE Transactions on Information Theory
Hi-index | 0.00 |
We study the nonrandomness of proteome sequences by analysing the correlations that arise between amino acids at a short and medium range, more specifically, between amino acids located 10 or 100 residues apart; respectively. We show that statistical models that consider these two types of correlation are more likely to seize the information contained in protein sequences and thus achieve good compression rates. Finally, we propose that the cause for this redundancy is related to the evolutionary origin of proteomes and protein sequences.