Supervised clustering of label ranking data using label preference information

Authors:
Mihajlo Grbovic;Nemanja Djuric;Shengbo Guo;Slobodan Vucetic
Affiliations:
Department of Computer and Information Sciences, Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, USA 19122;Department of Computer and Information Sciences, Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, USA 19122;Xerox Research Centre Europe, Meylan, France 38240;Department of Computer and Information Sciences, Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, USA 19122
Venue:
Machine Learning
Year:
2013

Citing 18
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Data mining: concepts and techniques

Data mining: concepts and techniques
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
A perspective view and survey of meta-learning

Artificial Intelligence Review
Supervised Clustering " Algorithms and Benefits

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Supervised clustering with support vector machines

ICML '05 Proceedings of the 22nd international conference on Machine learning
Ordering by weighted number of wins gives a good ranking for weighted tournaments

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Model-based evaluation of clustering validation measures

Pattern Recognition
Label ranking by learning pairwise preferences

Artificial Intelligence
Decision tree and instance-based learning for label ranking

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Bayesian inference for Plackett-Luce ranking models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Regression Learning Vector Quantization

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
A two-view learning approach for image tag ranking

Proceedings of the fourth ACM international conference on Web search and data mining
An Exponential Model for Infinite Rankings

The Journal of Machine Learning Research
Preferences in AI: An overview

Artificial Intelligence
An effective evaluation measure for clustering on evolving data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery of interesting regions in spatial data sets using supervised clustering

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Soft nearest prototype classification

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies supervised clustering in the context of label ranking data. The goal is to partition the feature space into K clusters, such that they are compact in both the feature and label ranking space. This type of clustering has many potential applications. For example, in target marketing we might want to come up with K different offers or marketing strategies for our target audience. Thus, we aim at clustering the customers' feature space into K clusters by leveraging the revealed or stated, potentially incomplete customer preferences over products, such that the preferences of customers within one cluster are more similar to each other than to those of customers in other clusters. We establish several baseline algorithms and propose two principled algorithms for supervised clustering. In the first baseline, the clusters are created in an unsupervised manner, followed by assigning a representative label ranking to each cluster. In the second baseline, the label ranking space is clustered first, followed by partitioning the feature space based on the central rankings. In the third baseline, clustering is applied on a new feature space consisting of both features and label rankings, followed by mapping back to the original feature and ranking space. The RankTree principled approach is based on a Ranking Tree algorithm previously proposed for label ranking prediction. Our modification starts with K random label rankings and iteratively splits the feature space to minimize the ranking loss, followed by re-calculation of the K rankings based on cluster assignments. The MM-PL approach is a multi-prototype supervised clustering algorithm based on the Plackett-Luce (PL) probabilistic ranking model. It represents each cluster with a union of Voronoi cells that are defined by a set of prototypes, and assign each cluster with a set of PL label scores that determine the cluster central ranking. Cluster membership and ranking prediction for a new instance are determined by cluster membership of its nearest prototype. The unknown cluster PL parameters and prototype positions are learned by minimizing the ranking loss, based on two variants of the expectation-maximization algorithm. Evaluation of the proposed algorithms was conducted on synthetic and real-life label ranking data by considering several measures of cluster goodness: (1) cluster compactness in feature space, (2) cluster compactness in label ranking space and (3) label ranking prediction loss. Experimental results demonstrate that the proposed MM-PL and RankTree models are superior to the baseline models. Further, MM-PL is has shown to be much better than other algorithms at handling situations with significant fraction of missing label preferences.