A Comprehensive Analysis Workflow for Genome-Wide Screening Data from ChIP-Sequencing Experiments

Authors:
Hatice Gulcin Ozer;Doruk Bozdağ;Terry Camerlengo;Jiejun Wu;Yi-Wen Huang;Tim Hartley;Jeffrey D. Parvin;Tim Huang;Umit V. Catalyurek;Kun Huang
Affiliations:
Department of Biomedical Informatics, The Ohio State University, and The Ohio State University Comprehensive Cancer Center Biomedical Informatics Shared Resource,;Department of Biomedical Informatics, The Ohio State University, and Department of Electrical & Computer Engineering, The Ohio State University,;Department of Biomedical Informatics, The Ohio State University,;Department of Molecular Virology, Immunology, The Ohio State University, Columbus, USA 43210;Department of Molecular Virology, Immunology, The Ohio State University, Columbus, USA 43210;Department of Biomedical Informatics, The Ohio State University,;Department of Biomedical Informatics, The Ohio State University, and The Ohio State University Comprehensive Cancer Center Biomedical Informatics Shared Resource,;Department of Molecular Virology, Immunology, The Ohio State University, Columbus, USA 43210;Department of Biomedical Informatics, The Ohio State University,;Department of Biomedical Informatics, The Ohio State University, and The Ohio State University Comprehensive Cancer Center Biomedical Informatics Shared Resource,
Venue:
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Year:
2009

Citing 3
Cited 0

Distributed processing of very large datasets with DataCutter

Parallel Computing - Clusters and computational grids for scientific computing
SeqMap

Bioinformatics
Parallel short sequence mapping for high throughput genome sequencing

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

ChIP-sequencing is a new technique for generating short DNA sequences useful in analyzing DNA-protein interactions and carrying out genome-wide studies. Although there are some studies to process and analyze ChIP-sequencing data, a complete workflow has not been reported yet. The size of the data and broad range of biological questions are the main challenges to establish a data analysis workflow for ChIP-sequencing data. In this paper, we present the ChIP-sequencing data analysis workflow that we developed at the Ohio State University Comprehensive Cancer Center Bioinformatics Shared Resources. This pipeline utilizes 1) use of different mapping algorithms such as Eland, MapReads, SeqMap, RMAP to align short sequence reads to the reference genome 2) a novel normalization algorithm to detect significant binding densities and to compare binding densities of different experiments 3) gene database mapping and 3D binding density visualization 4) distributed computing and high performance computing (HPC) support.