The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Spectral Clustering, With Application To Speech Separation
The Journal of Machine Learning Research
A tutorial on spectral clustering
Statistics and Computing
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Accelerating spectral clustering with partial supervision
Data Mining and Knowledge Discovery
Knowledge and Information Systems
SocialTransfer: cross-domain transfer learning from social streams for media applications
Proceedings of the 20th ACM international conference on Multimedia
Understanding and improving relational matrix factorization in recommender systems
Proceedings of the 7th ACM conference on Recommender systems
Hi-index | 0.00 |
Semi-supervised learning algorithms commonly incorporate the available background knowledge such that an expression of the derived model's quality is improved. Depending on the specific context quality can take several forms and can be related to the generalization performance or to a simple clustering coherence measure. Recently, a novel perspective of semi-supervised learning has been put forward, that associates semi-supervised clustering with the efficiency of spectral methods. More precisely, it has been demonstrated that the appropriate use of partial supervision can bias the data Laplacian matrix such that the necessary eigenvector computations are provably accelerated. This result allows data mining practitioners to use background knowledge not only for improving the quality of clustering results, but also for accelerating the required computations. In this paper we initially provide a high level overview of the relevant efficiency maximizing semi-supervised methods such that their theoretical intuitions are comprehensively outlined. Consecutively, we demonstrate how these methods can be extended to handle multiple clusters and also discuss possible issues that may arise in the continuous semi-supervised solution. Finally, we illustrate the proposed extensions empirically in the context of text clustering.