Rank-aware clustering of structured datasets

Authors:
Julia Stoyanovich;Sihem Amer-Yahia
Affiliations:
Columbia University, New York, NY, USA;Yahoo! Research, New York, NY, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 7
Cited 1

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Experience with personalization of Yahoo!

Communications of the ACM
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Supporting ranking and clustering as generalized order-by and group-by

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Probabilistic ranking of database query results

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
RankClus: integrating clustering with ranking for heterogeneous information network analysis

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

Making interval-based clustering rank-aware

Proceedings of the 14th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In online applications such as Yahoo! Personals and Yahoo! Real Estate users define structured profiles in order to find potentially interesting matches. Typically, profiles are evaluated against large datasets and produce thousands of matches. In addition to filtering, users also specify ranking in their profile, and matches are returned in a ranked list. Top results in a list are typically homogeneous, which hinders data exploration. For example, a user looking for 1- or 2-bedroom apartments sorted by price will see a large number of cheap 1-bedrooms in undesirable neighborhoods before seeing a different apartment. An alternative to ranking is to group matches on common attribute values, e.g., cheap 1-bedrooms in good neighborhoods, 2-bedrooms with 2 baths, and choose groups in relationship with ranking. In this paper, we present a novel paradigm of rank-aware clustering, and demonstrate its effectiveness on a large dataset from Yahoo! Personals, a leading online dating site.