Algorithms for clustering data
Algorithms for clustering data
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Highlights: language- and domain-independent automatic indexing terms for abstracting
Journal of the American Society for Information Science
Using n-grams for Korean text retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Understanding search engines: mathematical modeling and text retrieval
Understanding search engines: mathematical modeling and text retrieval
Sequencing by hybridization using direct and reverse cooperating spectra
Proceedings of the sixth annual international conference on Computational biology
Text Mining with Information-Theoretic Clustering
Computing in Science and Engineering
Tree-structured Partitioning Based on Splitting Histograms of Distances
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Sequencing-by-Hybridization Revisited: The Analog-Spectrum Proposal
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Minimal-dot plot: "Old tale in new skin" about sequence comparison
Information Sciences: an International Journal
Classification of Tandem Repeats in the Human Genome
International Journal of Knowledge Discovery in Bioinformatics
Classification of Tandem Repeats in the Human Genome
International Journal of Knowledge Discovery in Bioinformatics
Hi-index | 0.01 |
This paper is devoted to the techniques of clustering of texts based on the comparison of vocabularies of N-grams. In contrast to the regular N-grams approach, the proposed N-grams method is based on calculation of imperfect occurrences of N-grams in a text up to a number of mismatched strings. We demonstrated that such an approach essentially improves the resolving capacity of the N-grams method for DNA texts. Additionally, we discuss a mutual usage scheme of different clustering technique types to verify the partition quality.