Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Experience with personalization of Yahoo!
Communications of the ACM
Clustering through decision tree construction
Proceedings of the ninth international conference on Information and knowledge management
Evaluating document clustering for interactive information retrieval
Proceedings of the tenth international conference on Information and knowledge management
A new cell-based clustering method for large, high-dimensional data in data mining applications
Proceedings of the 2002 ACM symposium on Applied computing
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Proceedings of the 13th international conference on World Wide Web
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
The complexity of mining maximal frequent itemsets and maximal frequent patterns
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Sampling search-engine results
WWW '05 Proceedings of the 14th international conference on World Wide Web
Being accurate is not enough: how accuracy metrics have hurt recommender systems
CHI '06 Extended Abstracts on Human Factors in Computing Systems
Ordering the attributes of query results
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Improving web search ranking by incorporating user behavior information
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Extracting redundancy-aware top-k patterns
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Supporting ranking and clustering as generalized order-by and group-by
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Addressing diverse user preferences in SQL-query-result navigation
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Probabilistic ranking of database query results
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Introduction to recommender systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
It takes variety to make a world: diversification in recommender systems
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
RankClus: integrating clustering with ranking for heterogeneous information network analysis
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Computation of Diverse Query Results
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Rank-aware clustering of structured datasets
Proceedings of the 18th ACM conference on Information and knowledge management
DiRec: Diversified recommendations for semantic-less Collaborative Filtering
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Diversification and refinement in collaborative filtering recommender
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
In online applications, such as online dating, users often query and rank large collections of structured items. Top results tend to be homogeneous, which hinders data exploration. For example, a dating website user who is looking for a partner between 20 and 40 years old, and who sorts the matches by income from higher to lower, will see a large number of matches in their late 30s who hold an MBA degree and work in the financial industry, before seeing any matches in different age groups and walks of life. An alternative to presenting results in a ranked list is to find clusters in the result space, identified by a combination of attributes that correlate with rank. Such clusters may describe matches between 35 and 40 with an MBA, matches between 25 and 30 who work in the software industry, etc., allowing for data exploration of ranked results. We refer to the problem of finding such clusters as rank-aware interval-based clustering and argue that it is not addressed by standard clustering algorithms. We formally define the problem and, to solve it, propose a novel measure of locality, together with a family of clustering quality measures appropriate for this application scenario. These ingredients may be used by a variety of clustering algorithms, and we present BARAC, a particular subspace-clustering algorithm that enables rank-aware interval-based clustering in domains with heterogeneous attributes. We validate the effectiveness of our approach with a large-scale user study, and perform an extensive experimental evaluation of efficiency, demonstrating that our methods are practical on the large scale. Our evaluation is performed on large datasets from Yahoo! Personals, a leading online dating site, and on restaurant data from Yahoo! Local.