Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Speeding up the detection of evolutive tandem repeats
Theoretical Computer Science
Finding approximate tandem repeats in genomic sequences
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Tandem repeats over the edit distance
Bioinformatics
Bioinformatics
The method of N-grams in large-scale clustering of DNA texts
Pattern Recognition
Hi-index | 0.00 |
Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover these tandem repeats generate a huge volume of data, which is often difficult to decipher without further organization. In this paper, the authors describe a new method for post-processing tandem repeats through clustering and classification. Their work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of the clusters for the tandem repeats in the human genome shows that the method yields a well-defined grouping in which similarity among repeats is apparent. The authors' new, alignment-free method facilitates the analysis of the myriad of tandem repeats that occur in the human genome and they believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats.