A new challenge for compression algorithms: genetic sequences
Information Processing and Management: an International Journal - Special issue: data compression
Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
Approximate nearest neighbors and sequence comparison with block operations
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Communication complexity of document exchange
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Estimating true evolutionary distances between genomes
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Inequalities for Shannon entropies and Kolmogorov Complexities
CCC '97 Proceedings of the 12th Annual IEEE Conference on Computational Complexity
Combinatorial Interpretation of Kolmogorov Complexity
COCO '00 Proceedings of the 15th Annual IEEE Conference on Computational Complexity
Independent Minimum Length Programs to Translate between Given Strings
COCO '00 Proceedings of the 15th Annual IEEE Conference on Computational Complexity
Logical Operations and Kolmogorov Complexity II
CCC '01 Proceedings of the 16th Annual Conference on Computational Complexity
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient normalized maximum likelihood algorithm for DNA sequence compression
ACM Transactions on Information Systems (TOIS)
Substring compression problems
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Algorithmic Clustering of Music Based on String Compression
Computer Music Journal
Gene Mapping and Marker Clustering Using Shannon's Mutual Information
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A Bit Level Representation for Time Series Data Mining with Shape Based Similarity
Data Mining and Knowledge Discovery
Estimating relatedness via data compression
ICML '06 Proceedings of the 23rd international conference on Machine learning
A corpus-driven approach for design, evolution and alignment of ontologies
Proceedings of the 38th conference on Winter simulation
Compression-based data mining of sequential data
Data Mining and Knowledge Discovery
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Artificial Intelligence Review
AIKED'06 Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases
Content-based image retrieval with the normalized information distance
Computer Vision and Image Understanding
Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering
ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Some issues about outlier detection in rough set theory
Expert Systems with Applications: An International Journal
ACM Transactions on Information and System Security (TISSEC)
Analysis of Components for Generalization using Multidimensional Scaling
Fundamenta Informaticae
Capability and limitation of financial time-series data prediction using symbol string quantization
Proceedings of the 2009 International Conference on Hybrid Information Technology
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
A bounded distance metric for comparing tree structure
Information Systems
Evolving computer-generated music by means of the normalized compression distance
SMO'05 Proceedings of the 5th WSEAS international conference on Simulation, modelling and optimization
Image classification via LZ78 based string kernel: a comparative study
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Similarity of objects and the meaning of words
TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
CBTV: visualising case bases for similarity measure design and selection
ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
Towards logical hypertext structure
IICS'04 Proceedings of the 4th international conference on Innovative Internet Community Systems
A General Similarity Framework for Horn Clause Logic
Fundamenta Informaticae
Analysis of Components for Generalization using Multidimensional Scaling
Fundamenta Informaticae
A framework for semantic-based similarity measures for ELH-concepts
JELIA'12 Proceedings of the 13th European conference on Logics in Artificial Intelligence
Hi-index | 0.00 |
A new class of metrics appropriate for measuring effective similarity relations between sequences, say one type of similarity per metric, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it minorizes every metric in the class (that is, it is universal in that it discovers all effective similarities). We demonstrate that it too is a metric and takes values in [0, 1]; hence it may be called the similarity metric. This is a theory foundation for a new general practical tool. We give two distinctive applications in widely divergent areas (the experiments by necessity use just computable approximations to the target notions). First, we computationally compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we give fully automatically computed language tree of 52 different language based on translated versions of the "Universal Declaration of Human Rights".