Classification of Tandem Repeats in the Human Genome

  • Authors:
  • Yupu Liang;Dina Sokol;Sarah Zelikovitz;Sarah Ita Levitan

  • Affiliations:
  • Department of Computer Science, City University of New York, New York, NY, USA;Department of Computer and Information Science, Brooklyn College of CUNY, Brooklyn, NY, USA;Department of Computer Science, College of Staten Island of CUNY, Staten Island, NY, USA;Department of Computer Science, Brooklyn College of CUNY, Brooklyn, NY, USA

  • Venue:
  • International Journal of Knowledge Discovery in Bioinformatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover these tandem repeats generate a huge volume of data, which is often difficult to decipher without further organization. In this paper, the authors describe a new method for post-processing tandem repeats through clustering and classification. Their work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of the clusters for the tandem repeats in the human genome shows that the method yields a well-defined grouping in which similarity among repeats is apparent. The authors' new, alignment-free method facilitates the analysis of the myriad of tandem repeats that occur in the human genome and they believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats.