An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
Proceedings of the sixth annual international conference on Computational biology
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
On-Line Handwriting Recognition with Support Vector Machines " A Kernel Approach
IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
Text classification using string kernels
The Journal of Machine Learning Research
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Image classification via LZ78 based string kernel: a comparative study
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Efficient LZ78 factorization of grammar compressed text
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
We have shown [8] that LZ78 parse length can be used effectively for a music classification task. The parse length is used to compute a normalized information distance [6,7] which is then used to drive a simple classifier. In this paper we explore a more subtle use of the LZ78 parsing algorithm. Instead of simply counting the parse length of a string, we use the coding dictionary constructed by LZ78 to derive a valid string kernel for a Support Vector Machine (SVM). The kernel is defined over a feature space indexed by all the phrases identified by our (modified) LZ78 compression algorithm. We report experiments with our kernel approach on two datasets: (i) a collection of MIDI files and (ii) Reuters-21578. We compare our technique with an n-gram based kernel. Our results indicate that the LZ78 kernel technique has a performance similar to that obtained with the best n-gram performance but with significantly lower computational overhead, and without requiring a search for the optimal value of n.