Similarity word-sequence kernels for sentence clustering

Authors:
Jesús Andrés-Ferrer;Germán Sanchis-Trilles;Francisco Casacuberta
Affiliations:
Instituto Tecnológico de Informática, Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia;Instituto Tecnológico de Informática, Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia;Instituto Tecnológico de Informática, Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia
Venue:
SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Year:
2010

Citing 11
Cited 0

Algorithms for clustering data

Algorithms for clustering data
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text classification using string kernels

The Journal of Machine Learning Research
Word sequence kernels

The Journal of Machine Learning Research
A general regression technique for learning transductions

ICML '05 Proceedings of the 22nd international conference on Machine learning
On a Kernel Regression Approach to Machine Translation

IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
Kernel regression based machine translation

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Mercer kernel-based clustering in feature space

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a novel clustering approach based on the use of kernels as similarity functions and the C-means algorithm. Several word-sequence kernels are defined and extended to verify the properties of similarity functions. Afterwards, these monolingual wordsequence kernels are extended to bilingual word-sequence kernels, and applied to the task of monolingual and bilingual sentence clustering. The motivation of this proposal is to group similar sentences into clusters so that specialised models can be trained for each cluster, with the purpose of reducing in this way both the size and complexity of the initial task.We provide empirical evidence for proving that the use of bilingual kernels can lead to better clusters, in terms of intra-cluster perplexities.