PATMAP: polyadenylation site identification from next-generation sequencing data

  • Authors:
  • Xiaohui Wu;Meishuang Tang;Junfeng Yao;Shuiyuan Lin;Zhe Xiang;Guoli Ji

  • Affiliations:
  • Department of Automation, Xiamen University, Xiamen, China;Modern Educational Technical and Practical Training Center, Xiamen University, Xiamen, China;Software School, Xiamen University, Xiamen, China;Modern Educational Technical and Practical Training Center, Xiamen University, Xiamen, China;Department of Automation, Xiamen University, Xiamen, China;Department of Automation, Xiamen University, Xiamen, China

  • Venue:
  • HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Polyadenylation is an essential post-transcriptional processing step in the maturation of eukaryotic mRNA. The coming flood of next-generation sequencing (NGS) data creates new opportunities for intensive study of polyadenylation. We present an automated flow called PATMAP to identify polyadenylation sites (poly(A) sites) by integrating NGS data cleaning, processing, mapping, normalizing and clustering. The ambiguous region was introduced to parse the genome annotation by first. Then a series of Perl scripts were seamlessly integrated to iteratively map the single-end or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same coordinate were grouped into one cleavage site, and the internal priming artifacts were removed. Finally, these cleavage sites from different samples were normalized by a MA-based method and clustered into poly(A) clusters (PACs) by empirical Bayesian method. The effectiveness of PATMAP was demonstrated by identifying thousands of reliable PACs from millions of NGS sequences in Arabidopsis and yeast.