An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
Introduction to data compression (2nd ed.)
Introduction to data compression (2nd ed.)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Algorithmic Clustering of Music Based on String Compression
Computer Music Journal
Clustering Fetal Heart Rate Tracings by Compression
CBMS '06 Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Automatic attribution of text subject and even authorship attribution is possible with the use of classifiers. Previous studies used function-words and Support Vector Machine (SVM) to accomplish this task. We use a data compressor-based approach and a document similarity metric called Normalized Compression Distance (NCD). Tests were performed in the same database of a previous work, composed of 3,000 documents and 100 different authors, to allow comparison of the results. The results show that NCD can have a slightly better performance in such task, depending on the compressor used.