Kernel-Based Learning of Hierarchical Multilabel Classification Models
The Journal of Machine Learning Research
Linear-Time Computation of Similarity Measures for Sequential Data
The Journal of Machine Learning Research
A dependency-based word subsequence kernel
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Graph-based learning for statistical machine translation
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Semi-supervised abstraction-augmented string kernel for multi-level bio-relation extraction
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Conceptual modeling of online entertainment programming guide for natural language interface
NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Efficient algorithms for similarity measures over sequential data: a look beyond kernels
DAGM'06 Proceedings of the 28th conference on Pattern Recognition
A fast bit-parallel algorithm for gapped string kernels
ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I
Hi-index | 0.01 |
We present a sparse dynamic programming algorithm that, given two strings s and t , a gap penalty λ, and an integer p, computes the value of the gap-weighted length-p subsequences kernel. The algorithm works in time O(p |M| log |t|), where M = {(i,j) | si = tj} is the set of matches of characters in the two sequences. The algorithm is easily adapted to handle bounded length subsequences and different gap-penalty schemes, including penalizing by the total length of gaps and the number of gaps as well as incorporating character-specific match/gap penalties.The new algorithm is empirically evaluated against a full dynamic programming approach and a trie-based algorithm both on synthetic and newswire article data. Based on the experiments, the full dynamic programming approach is the fastest on short strings, and on long strings if the alphabet is small. On large alphabets, the new sparse dynamic programming algorithm is the most efficient. On medium-sized alphabets the trie-based approach is best if the maximum number of allowed gaps is strongly restricted.