MicroRNA target detection and analysis for genes related to breast cancer using MDLcompress

  • Authors:
  • Scott C. Evans;Antonis Kourtidis;T. Stephen Markham;Jonathan Miller;Douglas S. Conklin;Andrew S. Torres

  • Affiliations:
  • GE Global Research, One Research Circle, Niskayuna, NY;Gen *NY* Sis Center for Excellence in Cancer Genomics, University at Albany, State University of New York, Rensselaer, NY;GE Global Research, Niskayuna, NY;Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX;Gen *NY* Sis Center for Excellence in Cancer Genomics, University at Albany, State University of New York, Rensselaer, NY;GE Global Research, Niskayuna, NY

  • Venue:
  • EURASIP Journal on Bioinformatics and Systems Biology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe initial results of miRNA sequence analysis with the optimal symbol compression ratio (OSCR) algorithm and recast this grammar inference algorithm as an improved minimum description length (MDL) learning tool: MDLcompress. We apply this tool to explore the relationship between miRNAs, single nucleotide polymorphisms (SNPs), and breast cancer. Our new algorithm outperforms other grammar-based coding methods, such as DNA Sequitur, while retaining a two-part code that highlights biologically significant phrases. The deep recursion of MDLcompress, together with its explicit two-part coding, enables it to identify biologically meaningful sequence without needlessly restrictive priors. The ability to quantify cost in bits for phrases in the MDL model allows prediction of regions where SNPs may have the most impact on biological activity. MDLcompress improves on our previous algorithm in execution time through an innovative data structure, and in specificity of motif detection (compression) through improved heuristics. An MDLcompress analysis of 144 over expressed genes from the breast cancer cell line BT474 has identified novel motifs, including potential microRNA (miRNA) binding sites that are candidates for experimental validation.