Cross-Guided Clustering: Transfer of Relevant Supervision across Tasks

Authors:
Indrajit Bhattacharya;Shantanu Godbole;Sachindra Joshi;Ashish Verma
Affiliations:
Indian Institute of Science;IBM Research - India;IBM Research - India;IBM Research - India
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2012

Citing 28
Cited 0

Multitask Learning

Machine Learning - Special issue on inductive transfer
Learning to learn: introduction and overview

Learning to learn
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Parameterized generation of labeled datasets for text categorization based on a hierarchical directory

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Regularized multi--task learning

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving SVM accuracy by training on auxiliary data sources

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Supervised clustering with support vector machines

ICML '05 Proceedings of the 22nd international conference on Machine learning
Constructing informative priors using transfer learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning
Co-clustering based classification for out-of-domain documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Self-taught clustering

Proceedings of the 25th international conference on Machine learning
Knowledge transfer via multiple model local structure mapping

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Spectral domain-transfer learning

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
A convex formulation for learning shared structures from multiple tasks

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
EigenTransfer: a unified framework for transfer learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Transfer learning via dimensionality reduction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
A model of inductive bias learning

Journal of Artificial Intelligence Research
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research
Cross-Guided Clustering: Transfer of Relevant Supervision across Domains for Improved Clustering

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lack of supervision in clustering algorithms often leads to clusters that are not useful or interesting to human reviewers. We investigate if supervision can be automatically transferred for clustering a target task, by providing a relevant supervised partitioning of a dataset from a different source task. The target clustering is made more meaningful for the human user by trading-off intrinsic clustering goodness on the target task for alignment with relevant supervised partitions in the source task, wherever possible. We propose a cross-guided clustering algorithm that builds on traditional k-means by aligning the target clusters with source partitions. The alignment process makes use of a cross-task similarity measure that discovers hidden relationships across tasks. When the source and target tasks correspond to different domains with potentially different vocabularies, we propose a projection approach using pivot vocabularies for the cross-domain similarity measure. Using multiple real-world and synthetic datasets, we show that our approach improves clustering accuracy significantly over traditional k-means and state-of-the-art semi-supervised clustering baselines, over a wide range of data characteristics and parameter settings.