SMASHing regulatory sites in DNA by human-mouse sequence comparisons

  • Authors:
  • Mihaela Zavolan;Nicholas D. Socci;Nikolaus Rajewsky;Terry Gaasterlamd

  • Affiliations:
  • -;-;-;-

  • Venue:
  • CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Regulatory sequence elements provide important cluesto understanding and predicting gene expression. Althoughthe binding sites for hundreds of transcription factorsare known, there has been no systematic attempt toincorporate this information in the annotation of the humangenome. Cross species sequence comparisons arecritical to a meaningful annotation of regulatory elementssince they generally reside in conserved non-coding regions.To take advantage of the recently completed draftsof the mouse and human genomes for annotating transcriptionfactor binding sites, we developed SMASH, a computationalpipeline that identifies thousands of orthologous human/mouse proteins, maps them to genomic sequences, extractsand compares upstream regions and annotates putativeregulatory elements in conserved, non-coding, upstreamregions. Our current dataset consists of approximately2500 human/mouse gene pairs. Transcription startsites were estimated by mapping quasi-full length cDNA sequences.SMASH uses a novel probabilistic method to identifyputative conserved binding sites that takes into accountthe competition between transcription factors for bindingDNA. SMASH presents the results via a genome browserweb interface which displays the predicted regulatory informationtogether with the current annotations for the humangenome. Our results are validated by comparison to previouslypublished experimental data. SMASH results comparefavorably to other existing computational approaches.