Thermodynamics of computation and information distance
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
CYC: a large-scale investment in knowledge infrastructure
Communications of the ACM
An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
A compression algorithm for DNA sequences and its applications in genome comparison
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM SIGKDD Explorations Newsletter
Frequency estimates for statistical word similarity measures
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Algorithmic Clustering of Music Based on String Compression
Computer Music Journal
Shared information and program plagiarism detection
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Sublinear Algorithms for Approximating String Compressibility
APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
Analysis of tag within online social networks
Proceedings of the ACM 2009 international conference on Supporting group work
Semantic similarity measures for Malay sentences
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Clustering the normalized compression distance for influenza virus data
Algorithms and Applications
Semantic news recommendation using wordnet and bing similarities
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
We survey the emerging area of compression-based, parameter-free, similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a certain precision for that family if it minorizes every distance in the family between every two objects in the set, up to the stated precision (we do not require the universal distance to be an element of the family). We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have literal embodyments like the first type, but may also be abstract like “red” or “christianity.” For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features between pairs of literal objects. For the second type we consider similarity distances generated by web users corresponding to particular semantic relations between the (names for) the designated objects. For both families we give universal similarity distance measures, incorporating all particular distance measures in the family. In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms. In both cases experiments on a massive scale give evidence of the viability of the approaches.