Classification of Tandem Repeats in the Human Genome

Authors:
Yupu Liang;Dina Sokol;Sarah Zelikovitz;Sarah Ita Levitan
Affiliations:
Department of Computer Science, City University of New York, New York, NY, USA;Department of Computer and Information Science, Brooklyn College of CUNY, Brooklyn, NY, USA;Department of Computer Science, College of Staten Island of CUNY, Staten Island, NY, USA;Department of Computer Science, Brooklyn College of CUNY, Brooklyn, NY, USA
Venue:
International Journal of Knowledge Discovery in Bioinformatics
Year:
2012

Citing 8
Cited 0

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Speeding up the detection of evolutive tandem repeats

Theoretical Computer Science
Finding approximate tandem repeats in genomic sequences

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Tandem repeats over the edit distance

Bioinformatics
BEDTools

Bioinformatics
A visual framework for sequence analysis using n-grams and spectral rearrangement

Bioinformatics
The method of N-grams in large-scale clustering of DNA texts

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover these tandem repeats generate a huge volume of data, which is often difficult to decipher without further organization. In this paper, the authors describe a new method for post-processing tandem repeats through clustering and classification. Their work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of the clusters for the tandem repeats in the human genome shows that the method yields a well-defined grouping in which similarity among repeats is apparent. The authors' new, alignment-free method facilitates the analysis of the myriad of tandem repeats that occur in the human genome and they believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats.