Approximate Tree Kernels

Authors:
Konrad Rieck;Tammo Krueger;Ulf Brefeld;Klaus-Robert Müller
Affiliations:
-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2010

Citing 24
Cited 3

The nature of statistical learning theory

The nature of statistical learning theory
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Support vector domain description

Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Kernel PCA and de-noising in feature spaces

Proceedings of the 1998 conference on Advances in neural information processing systems II
The 1999 DARPA off-line intrusion detection evaluation

Computer Networks: The International Journal of Computer and Telecommunications Networking - Special issue on recent advances in intrusion detection systems
Introduction to Automata Theory, Languages and Computability

Introduction to Automata Theory, Languages and Computability
Applications of Data Mining in Computer Security

Applications of Data Mining in Computer Security
Kernels for Semi-Structured Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A high-level programming environment for packet trace anonymization and transformation

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
A Maximum-Entropy-Inspired Parser

A Maximum-Entropy-Inspired Parser
Anomaly detection of web-based attacks

Proceedings of the 10th ACM conference on Computer and communications security
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Learning question classifiers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
binpac: a yacc for writing application protocol parsers

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
A reference collection for web spam

ACM SIGIR Forum
Convolution kernels with feature selection for natural language processing tasks

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Fast and effective kernels for relational learning from texts

Proceedings of the 24th international conference on Machine learning
On Relevant Dimensions in Kernel Feature Spaces

The Journal of Machine Learning Research
Efficient convolution kernels for dependency and constituent syntactic trees

ECML'06 Proceedings of the 17th European conference on Machine Learning
Thwarting the nigritude ultramarine: learning to identify link spam

ECML'05 Proceedings of the 16th European conference on Machine Learning
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

Large-scale support vector learning with structural kernels

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers

ACM Transactions on Information Systems (TOIS)
Similarity measures for sequential data

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Convolution kernels for trees provide simple means for learning with tree-structured data. The computation time of tree kernels is quadratic in the size of the trees, since all pairs of nodes need to be compared. Thus, large parse trees, obtained from HTML documents or structured network data, render convolution kernels inapplicable. In this article, we propose an effective approximation technique for parse tree kernels. The approximate tree kernels (ATKs) limit kernel computation to a sparse subset of relevant subtrees and discard redundant structures, such that training and testing of kernel-based learning methods are significantly accelerated. We devise linear programming approaches for identifying such subsets for supervised and unsupervised learning tasks, respectively. Empirically, the approximate tree kernels attain run-time improvements up to three orders of magnitude while preserving the predictive accuracy of regular tree kernels. For unsupervised tasks, the approximate tree kernels even lead to more accurate predictions by identifying relevant dimensions in feature space.