Human genomes as email attachments

Authors:
Scott Christley;Yiming Lu;Chen Li;Xiaohui Xie
Affiliations:
-;-;-;-
Venue:
Bioinformatics
Year:
2009

Citing 0
Cited 9

Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Iterative Dictionary Construction for Compression of Large DNA Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Compressing genomic sequence fragments using SLIMGENE

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Fast relative lempel-ziv self-index for similar sequences

FAW-AAIM'12 Proceedings of the 6th international Frontiers in Algorithmics, and Proceedings of the 8th international conference on Algorithmic Aspects in Information and Management
KungFQ: A Simple and Powerful Approach to Compress fastq Files

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Optimized relative Lempel-Ziv compression of genomes

ACSC '11 Proceedings of the Thirty-Fourth Australasian Computer Science Conference - Volume 113
Practical compression for multi-alignment genomic files

ACSC '13 Proceedings of the Thirty-Sixth Australasian Computer Science Conference - Volume 135
RCSI: scalable similarity search in thousand(s) of genomes

Proceedings of the VLDB Endowment
FRESCO: Referential Compression of Highly Similar Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	3.84

Visualization

Abstract

Summary: The amount of genomic sequence data being generated and made available through public databases continues to increase at an ever-expanding rate. Downloading, copying, sharing and manipulating these large datasets are becoming difficult and time consuming for researchers. We need to consider using advanced compression techniques as part of a standard data format for genomic data. The inherent structure of genome data allows for more efficient lossless compression than can be obtained through the use of generic compression programs. We apply a series of techniques to James Watson's genome that in combination reduce it to a mere 4MB, small enough to be sent as an email attachment. Availability: Our algorithms are implemented in C++ and are freely available from http://www.ics.uci.edu/~xhx/project/DNAzip. Contact:chenli@ics.uci.edu; xhx@ics.uci.edu Supplementary information:Supplementary data are available at Bioinformatics online.