Alignments anchored on genomic landmarks can aid in the identification of regulatory elements

  • Authors:
  • Kannan Tharakaraman;Leonardo Mariòo-Ramírez;Sergey Sheetlin;David Landsman;John L. Spouge

  • Affiliations:
  • Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Building 38A, 8600 Rockville Pike, Bethesda, MD 20894-6075 ...;Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Building 38A, 8600 Rockville Pike, Bethesda, MD 20894-6075 ...;Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Building 38A, 8600 Rockville Pike, Bethesda, MD 20894-6075 ...;Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Building 38A, 8600 Rockville Pike, Bethesda, MD 20894-6075 ...;Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Building 38A, 8600 Rockville Pike, Bethesda, MD 20894-6075 ...

  • Venue:
  • Bioinformatics
  • Year:
  • 2005

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: The transcription start site (TSS) has been located for an increasing number of genes across several organisms. Statistical tests have shown that some cis-acting regulatory elements have positional preferences with respect to the TSS, but few strategies have emerged for locating elements by their positional preferences. This paper elaborates such a strategy. First, we align promoter regions without gaps, anchoring the alignment on each promoter's TSS. Second, we apply a novel word-specific mask. Third, we apply a clustering test related to gapless BLAST statistics. The test examines whether any specific word is placed unusually consistently with respect to the TSS. Finally, our program A-GLAM, an extension of the GLAM program, uses significant word positions as new 'anchors' to realign the sequences. A Gibbs sampling algorithm then locates putative cis-acting regulatory elements. Usually, Gibbs sampling requires a preliminary masking step, to avoid convergence onto a dominant but uninteresting signal from a DNA repeat. However, since the positional anchors focus A-GLAM on the motif of interest, masking DNA repeats during Gibbs sampling becomes unnecessary. Results: In a set of human DNA sequences with experimentally characterized TSSs, the placement of 791 octonucleotide words was unusually consistent (multiple test corrected P Availability: The A-GLAM program and a list of statistically significant words are available at ftp://ftp.ncbi.nih.gov/pub/spouge/papers/archive/AGLAM/. Contact: spouge@ncbi.nlm.nih.gov