Predicting the in vivo signature of human gene regulatory sequences

  • Authors:
  • William Stafford Noble;Scott Kuehn;Robert Thurman;Man Yu;John Stamatoyannopoulos

  • Affiliations:
  • Department of Genome Sciences and Department of Computer Science and Engineering, University of Washington Seattle, WA, USA;Division of Medical Genetics, University of Washington Seattle, WA, USA;Division of Medical Genetics, University of Washington Seattle, WA, USA;Division of Medical Genetics, University of Washington Seattle, WA, USA;Department of Molecular Biology Regulome, 2211 Elliott Avenue, Suite 600, Seattle, WA 98121, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2005

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: In the living cell nucleus, genomic DNA is packaged into chromatin. DNA sequences that regulate transcription and other chromosomal processes are associated with local disruptions, or 'openings', in chromatin structure caused by the cooperative action of regulatory proteins. Such perturbations are extremely specific for cis-regulatory elements and occur over short stretches of DNA (typically ∼250 bp). They can be detected experimentally as DNaseI hypersensitive sites (HSs) in vivo, though the process is extremely laborious and costly. The ability to discriminate DNaseI HSs computationally would have a major impact on the annotation and utilization of the human genome. Results: We found that a supervised pattern recognition algorithm, trained using a set of 280 DNaseI HS and 737 non-HS control sequences from erythroid cells, was capable of de novo prediction of HSs across the human genome with surprisingly high accuracy determined by prospective in vivo validation. Systematic application of this computational approach will greatly facilitate the discovery and analysis of functional non-coding elements in the human and other complex genomes. Availability: Supplementary data is available at noble.gs.washington.edu/proj/hs Contact:noble@gs.washington.edu; jstam@regulome.com