Kernels for Semi-Structured Data
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Hierarchical orderings of textual units
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Efficient convolution kernels for dependency and constituent syntactic trees
ECML'06 Proceedings of the 17th European conference on Machine Learning
Text classification using graph mining-based feature extraction
Knowledge-Based Systems
Hi-index | 0.00 |
In this paper, we discuss the structure based classification of documents based on their logical document structure, i.e., their DOM trees.We describe a method using predefined structural features and also four tree kernels suitable for such structures. We evaluate the methods experimentally on a corpus containing the DOM trees of newspaper articles, and on the well-known SUSANNE corpus. We will demonstrate that, for the two corpora, many text types can be learned based on structural features only.