Algorithmic Clustering of Music Based on String Compression
Computer Music Journal
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
Analyzing worms and network traffic using compression
Journal of Computer Security
Evaluating the Impact of Information Distortion on Normalized Compression Distance
ICMCTA '08 Proceedings of the 2nd international Castle meeting on Coding Theory and Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
Relevance of contextual information in compression-based text clustering
IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Reducing the Loss of Information through Annealing Text Distortion
IEEE Transactions on Knowledge and Data Engineering
Is the contextual information relevant in text clustering by compression?
Expert Systems with Applications: An International Journal
Shared information and program plagiarism detection
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Hi-index | 0.00 |
This thesis takes a small step towards better understanding both the nature of texts and the nature of compression distances. Broadly speaking, the way in which this is done is exploring the effects that several distortion techniques have on one of the most successful distances in the family of compression distances, the Normalized Compression Distance NCD. The experimental results show that changing the representation of texts applying one of the explored distortion techniques can be beneficial both in NCD-based document clustering and in NCD-based document search.