Establishing a statistic model for recognition of steroid hormone response elements

  • Authors:
  • Maria Stepanova;Feng Lin;Valerie C. -L. Lin

  • Affiliations:
  • Bioinformatics Research Centre, Nanyang Technological University, Nanyang Avenue, 50 Nanyang Drive, Singapore 639798, Singapore;Bioinformatics Research Centre, Nanyang Technological University, Nanyang Avenue, 50 Nanyang Drive, Singapore 639798, Singapore;School of Biological Sciences, Nanyang Technological University, Nanyang Avenue, 50 Nanyang Drive, Singapore 639798, Singapore

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identification of hormone response elements (HREs) is essential for understanding the mechanism of hormone-regulated gene expression. To date, there has been a lack of effective bioinformatics tools for recognition of specific HRE such as Progesterone Response Elements (PRE). In this paper, a comprehensive survey and comparison of in silico methods is conducted for establishing a more accurate statistic model. Homogeneity of steroid HRE is analyzed and a reliable training dataset is constructed through extensive searching for experimentally validated response elements from more than 150 literature sources. Based on the observation that the verified HREs carry di-nucleotide preservation in comparison with uniform nucleotide distributions, both mono and di-nucleotide Position Weight Matrices are computed to extract the statistic pattern of the positions. It is followed by the sequence transition pattern recognition using a specifically designed profile Hidden Markov Model. Reciprocal combination of the statistic and transition patterns significantly improves the performance of the model in terms of higher sensitivity and specificity. Upon acquisition of the putative response elements in the promoter areas of vertebrate genes, a qualitative scheme is applied to assess the probability for each gene to be a hormone primary target. Using 650 records of experimentally validated steroid hormone response elements, a high sensitivity level of 73% and high specificity level of one prediction per 8.24kb is reached, allowing this model to be used for further prediction of primary target genes through the analysis of their upstream promoters, for human or other vertebrate genomes of interest. Additional documents, supplementary data and the web-based program developed for response elements prediction are freely available for academic research at http://birc.ntu.edu.sg/~pmaria/. Submission of putative gene promoter regions for recognition of potential regulatory PREs can be as long as 5kb.