BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Density biased sampling: an improved method for data mining and clustering
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve
IEEE Transactions on Knowledge and Data Engineering
CLARANS: A Method for Clustering Objects for Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Finding Representative Set from Massive Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Cover trees for nearest neighbor
ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning user interaction models for predicting web search result preferences
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic information retrieval approach for ranking of database query results
ACM Transactions on Database Systems (TODS)
A Generalized K-Means Algorithm with Semi-Supervised Weight Coefficients
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Supporting ranking and clustering as generalized order-by and group-by
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficiently answering top-k typicality queries on large databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
DataScope: viewing database contents in Google Maps' way
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Photospread: a spreadsheet for managing photos
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Tree-based partition querying: a methodology for computing medoids in large spatial datasets
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Computation of Diverse Query Results
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A Spreadsheet Algebra for a Direct Data Manipulation Query Interface
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
DataLens: making a good first impression
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
How are we searching the World Wide Web? A comparison of nine search engine transaction logs
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Medoid queries in large spatial databases
SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Splash: ad-hoc querying of data and statistical models
Proceedings of the 13th International Conference on Extending Database Technology
DivQ: diversification for keyword search over structured databases
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Si-Fi: interactive similar item finder
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
ACM SIGMOD Record
Exploiting user feedback to improve quality of search results clustering
Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Query expansion based on clustered results
Proceedings of the VLDB Endowment
Diversification and refinement in collaborative filtering recommender
Proceedings of the 20th ACM international conference on Information and knowledge management
Skimmer: rapid scrolling of relational query results
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Dynamic diversification of continuous data
Proceedings of the 15th International Conference on Extending Database Technology
DisC diversity: result diversification based on dissimilarity and coverage
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the "best" results appear first. However, standard database query results comprise a set of tuples, with no associated ranking. It is typical to allow users the ability to sort results on selected attributes, but no actual ranking is defined. An alternative approach to the first page is not to try to show the best results, but instead to help users learn what is available in the whole result set and direct them to finding what they need. In this paper, we demonstrate through a user study that a page comprising one representative from each of k clusters (generated through a k-medoid clustering) is superior to multiple alternative candidate methods for generating representatives of a data set. Users often refine query specifications based on returned results. Traditional clustering may lead to completely new representatives after a refinement step. Furthermore, clustering can be computationally expensive. We propose a tree-based method for efficiently generating the representatives, and smoothly adapting them with query refinement. Experiments show that our algorithms outperform the state-of-the-art in both result quality and efficiency.