Information distance from a question to an answer
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Information shared by many objects
Proceedings of the 17th ACM conference on Information and knowledge management
New information distance measure and its application in question answering system
Journal of Computer Science and Technology
Information distance and its extensions
DS'11 Proceedings of the 14th international conference on Discovery science
Information distance and its applications
CIAA'06 Proceedings of the 11th international conference on Implementation and Application of Automata
Classifying stem cell differentiation images by information distance
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Hi-index | 3.84 |
Motivation: Distance measures built on the notion of text compression have been used for the comparison and classification of entire genomes and mitochondrial genomes. The present study was undertaken in order to explore their utility in the classification of protein sequences. Results: We constructed compression-based distance measures (CBMs) using the Lempel-Zlv and the PPMZ compression algorithms and compared their performance with that of the Smith--Waterman algorithm and BLAST, using nearest neighbour or support vector machine classification schemes. The datasets included a subset of the SCOP protein structure database to test distant protein similarities, a 3-phosphoglycerate-kinase sequences selected from archaean, bacterial and eukaryotic species as well as low and high-complexity sequence segments of the human proteome, CBMs values show a dependence on the length and the complexity of the sequences compared. In classification tasks CBMs performed especially well on distantly related proteins where the performance of a combined measure, constructed from a CBM and a BLAST score, approached or even slightly exceeded that of the Smith--Waterman algorithm and two hidden Markov model-based algorithms. Contact: kocsor@inf.u-szeged.hu Supplementary information: http://www.inf.u-szeged.hu/~kocsor/CBMO5