Self-taught clustering

Authors:
Wenyuan Dai;Qiang Yang;Gui-Rong Xue;Yong Yu
Affiliations:
Shanghai Jiao Tong University, Shanghai, China;Hong Kong University of Science and Technology, Kowloon, Hong Kong;Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 18
Cited 27

Algorithms for clustering data

Algorithms for clustering data
Elements of information theory

Elements of information theory
Multitask Learning

Machine Learning - Special issue on inductive transfer
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Multiple kernel learning, conic duality, and the SMO algorithm

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Improving SVM accuracy by training on auxiliary data sources

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Supervised clustering with support vector machines

ICML '05 Proceedings of the 22nd international conference on Machine learning
Constructing informative priors using transfer learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior

The Journal of Machine Learning Research
Intractability and clustering with constraints

Proceedings of the 24th international conference on Machine learning
Revisiting probabilistic models for clustering with pair-wise constraints

Proceedings of the 24th international conference on Machine learning
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning

Feature Selection by Transfer Learning with Linear Regularized Models

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Selecting informative universum sample for semi-supervised learning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Heterogeneous transfer learning for image clustering via the social web

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Transfer Discriminative Logmaps

PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Unsupervised transfer classification: application to text categorization

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised projection clustering with transferred centroid regularization

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Logistic regression for transductive transfer learning from multiple sources

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Multitask Bregman clustering

Neurocomputing
High-order co-clustering text data on semantics-based representation model

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Multi-view transfer learning with a large margin approach

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Ranking function adaptation with boosting trees

ACM Transactions on Information Systems (TOIS)
Transferring topical knowledge from auxiliary long texts for short text clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Leveraging Auxiliary Data for Learning to Rank

ACM Transactions on Intelligent Systems and Technology (TIST)
Cost effective depression patient thought record categorization via self-taught learning

Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments
Research on text categorization based on a weakly-supervised transfer learning method

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Cross-Guided Clustering: Transfer of Relevant Supervision across Tasks

ACM Transactions on Knowledge Discovery from Data (TKDD)
Sentiment detection with auxiliary data

Information Retrieval
Self-taught dimensionality reduction on the high-dimensional small-sized data

Pattern Recognition
Linear semi-supervised projection clustering by transferred centroid regularization

Journal of Intelligent Information Systems
Transfer spectral clustering

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
Sparse hashing for fast multimedia search

ACM Transactions on Information Systems (TOIS)
Flexible and robust co-regularized multi-domain graph clustering

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
An unsupervised transfer learning approach to discover topics for online reputation management

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Learning person-specific models for facial expression and action unit recognition

Pattern Recognition Letters
Transfer learning with one-class data

Pattern Recognition Letters
Self-taught learning via exponential family sparse coding for cost-effective patient thought record categorization

Personal and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on a new clustering task, called self-taught clustering. Self-taught clustering is an instance of unsupervised transfer learning, which aims at clustering a small collection of target unlabeled data with the help of a large amount of auxiliary unlabeled data. The target and auxiliary data can be different in topic distribution. We show that even when the target data are not sufficient to allow effective learning of a high quality feature representation, it is possible to learn the useful features with the help of the auxiliary data on which the target data can be clustered effectively. We propose a co-clustering based self-taught clustering algorithm to tackle this problem, by clustering the target and auxiliary data simultaneously to allow the feature representation from the auxiliary data to influence the target data through a common set of features. Under the new data representation, clustering on the target data can be improved. Our experiments on image clustering show that our algorithm can greatly outperform several state-of-the-art clustering methods when utilizing irrelevant unlabeled auxiliary data.