Compression-based data mining of sequential data
Data Mining and Knowledge Discovery
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Spam filtering for short messages
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Dictionary based color image retrieval
Journal of Visual Communication and Image Representation
Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
Polylog Space Compression Is Incomparable with Lempel-Ziv and Pushdown Compression
SOFSEM '09 Proceedings of the 35th Conference on Current Trends in Theory and Practice of Computer Science
ACM Transactions on Information and System Security (TISSEC)
A very low bit-rate minimalist video encoder based on matching pursuits
CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Compression for anti-adversarial learning
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Semantic Pattern Transformation: Applying Knowledge Discovery Processes in Heterogeneous Domains
Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies
Hi-index | 0.00 |
The use of compression algorithms in machine learning tasks such as clustering and classification has appeared in a variety of fields, sometimes with the promise of reducing problems of explicit feature selection. The theoretical justification for such methods has been founded on an upper bound on Kolmogorov complexity and an idealized information space. An alternate view shows compression algorithms implicitly map strings into implicit feature space vectors, and compressionbased similarity measures compute similarity within these feature spaces. Thus, compression-based methods are not a "parameter free" magic bullet for feature selection and data representation, but are instead concrete similarity measures within defined feature spaces, and are therefore akin to explicit feature vector models used in standard machine learning algorithms. To underscore this point, we find theoretical and empirical connections between traditional machine learning vector models and compression, encouraging cross-fertilization in future work.