Mind the eigen-gap, or how to accelerate semi-supervised spectral learning algorithms

Authors:
Dimitrios Mavroeidis
Affiliations:
Radboud University Nijmegen, Nijmegen, Netherlands
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Year:
2011

Citing 7
Cited 2

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Spectral Clustering, With Application To Speech Separation

The Journal of Machine Learning Research
A tutorial on spectral clustering

Statistics and Computing
Enhancing the Stability of Spectral Ordering with Sparsification and Partial Supervision: Application to Paleontological Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Accelerating spectral clustering with partial supervision

Data Mining and Knowledge Discovery
Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection

Knowledge and Information Systems

SocialTransfer: cross-domain transfer learning from social streams for media applications

Proceedings of the 20th ACM international conference on Multimedia
Understanding and improving relational matrix factorization in recommender systems

Proceedings of the 7th ACM conference on Recommender systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-supervised learning algorithms commonly incorporate the available background knowledge such that an expression of the derived model's quality is improved. Depending on the specific context quality can take several forms and can be related to the generalization performance or to a simple clustering coherence measure. Recently, a novel perspective of semi-supervised learning has been put forward, that associates semi-supervised clustering with the efficiency of spectral methods. More precisely, it has been demonstrated that the appropriate use of partial supervision can bias the data Laplacian matrix such that the necessary eigenvector computations are provably accelerated. This result allows data mining practitioners to use background knowledge not only for improving the quality of clustering results, but also for accelerating the required computations. In this paper we initially provide a high level overview of the relevant efficiency maximizing semi-supervised methods such that their theoretical intuitions are comprehensively outlined. Consecutively, we demonstrate how these methods can be extended to handle multiple clusters and also discuss possible issues that may arise in the continuous semi-supervised solution. Finally, we illustrate the proposed extensions empirically in the context of text clustering.