A Lossless Compression Algorithm for DNA sequences

Authors:
Taysir H. A. Soliman;Tarek F. Gharib;Alshaimaa Abo-Alian;M.A. El Sharkawy
Affiliations:
Faculty of Computer and Information, Assiut University, Egypt.;Faculty of Computer and Information Sciences, Ain Shams University, Egypt.;Faculty of Computer and Information Sciences, Ain Shams University, Egypt.;Faculty of Computer and Information Sciences, Ain Shams University, Egypt
Venue:
International Journal of Bioinformatics Research and Applications
Year:
2009

Citing 6
Cited 0

A new challenge for compression algorithms: genetic sequences

Information Processing and Management: an International Journal - Special issue: data compression
Compression of Strings with Approximate Repeats

ISMB '98 Proceedings of the 6th International Conference on Intelligent Systems for Molecular Biology
A Guaranteed Compression Scheme for Repetitive DNA Sequences

DCC '96 Proceedings of the Conference on Data Compression
Compression of Biological Sequences by Greedy Off-Line Textual Substitution

DCC '00 Proceedings of the Conference on Data Compression
An efficient normalized maximum likelihood algorithm for DNA sequence compression

ACM Transactions on Information Systems (TOIS)
DNA compression challenge revisited: a dynamic programming approach

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increase of the amount of DNA sequences requires efficient computational algorithms for performing sequence comparison and analysis. Standard compression algorithms are not able to compress DNA sequences because they do not consider special characteristics of DNA sequences (i.e., DNA sequences contain several approximate repeats and complimentary palindromes). Recently, new algorithms have been proposed to compress DNA sequences, often using detection of long approximate repeats. The current work proposes a Lossless Compression Algorithm (LCA), providing a new encoding method. LCA achieves a better compression ratio than that of existing DNA-oriented compression algorithms, when compared to GenCompress, DNACompress, and DNAPack.