The nature of statistical learning theory
The nature of statistical learning theory
Kernels for Semi-Structured Data
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Text classification using string kernels
The Journal of Machine Learning Research
A survey of kernels for structured data
ACM SIGKDD Explorations Newsletter
Hierarchical orderings of textual units
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
A study on convolution kernels for shallow semantic parsing
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Efficient convolution kernels for dependency and constituent syntactic trees
ECML'06 Proceedings of the 17th European conference on Machine Learning
Self-supervised automated wrapper generation for weblog data extraction
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
In this paper, we discuss kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees in which every node might be labeled by a vector of attributes including its XML tag and the textual content. We describe five new kernels suitable for such structures: a kernel based on predefined structural features, a tree kernel derived from the well-known parse tree kernel, the set tree kernel that allows permutations of children, the string tree kernel being an extension of the so-called partial tree kernel, and the soft tree kernel as a more efficient alternative. We evaluate the kernels experimentally on a corpus containing the DOM trees of newspaper articles and on the well-known SUSANNE corpus.