Practical solutions to the problem of diagonal dominance in kernel document clustering

Authors:
Derek Greene;Pádraig Cunningham
Affiliations:
University of Dublin, Dublin, Ireland;University of Dublin, Dublin, Ireland
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 9
Cited 5

Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Advances in Large Margin Classifiers

Advances in Large Margin Classifiers
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
A Kernel Approach for Learning from almost Orthogonal Patterns

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Iterative Clustering of High Dimensional Text Data Augmented by Local Search

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Word sequence kernels

The Journal of Machine Learning Research
An Extended Kernel for Generalized Multiple-Instance Learning

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Protein homology detection using string alignment kernels

Bioinformatics

A spectral approach to clustering numerical vectors as nodes in a network

Pattern Recognition
Efficient prediction-based validation for document clustering

ECML'06 Proceedings of the 17th European conference on Machine Learning
A multi-classifier system for text categorization

Proceedings of the 2011 ACM Symposium on Research in Applied Computation
Positional and confidence voting-based consensus functions for fuzzy cluster ensembles

Fuzzy Sets and Systems
Unsupervised graph-based topic labelling using dbpedia

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In supervised kernel methods, it has been observed that the performance of the SVM classifier is poor in cases where the diagonal entries of the Gram matrix are large relative to the off-diagonal entries. This problem, referred to as diagonal dominance, often occurs when certain kernel functions are applied to sparse high-dimensional data, such as text corpora. In this paper we investigate the implications of diagonal dominance for unsupervised kernel methods, specifically in the task of document clustering. We propose a selection of strategies for addressing this issue, and evaluate their effectiveness in producing more accurate and stable clusterings.