Using trees to depict a forest

Authors:
Bin Liu;H. V. Jagadish
Affiliations:
University of Michigan, Ann Arbor;University of Michigan, Ann Arbor
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 27
Cited 10

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Density biased sampling: an improved method for data mining and clustering

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
CLARANS: A Method for Clustering Objects for Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Finding Representative Set from Massive Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Cover trees for nearest neighbor

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning user interaction models for predicting web search result preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic information retrieval approach for ranking of database query results

ACM Transactions on Database Systems (TODS)
A Generalized K-Means Algorithm with Semi-Supervised Weight Coefficients

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Supporting ranking and clustering as generalized order-by and group-by

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Direct Manipulation: A Step Beyond Programming Languages

Computer
Efficiently answering top-k typicality queries on large databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
DataScope: viewing database contents in Google Maps' way

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Photospread: a spreadsheet for managing photos

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Tree-based partition querying: a methodology for computing medoids in large spatial datasets

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Computation of Diverse Query Results

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A Spreadsheet Algebra for a Direct Data Manipulation Query Interface

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
DataLens: making a good first impression

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
How are we searching the World Wide Web? A comparison of nine search engine transaction logs

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Medoid queries in large spatial databases

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Splash: ad-hoc querying of data and statistical models

Proceedings of the 13th International Conference on Extending Database Technology
DivQ: diversification for keyword search over structured databases

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Si-Fi: interactive similar item finder

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Search result diversification

ACM SIGMOD Record
Exploiting user feedback to improve quality of search results clustering

Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Query expansion based on clustered results

Proceedings of the VLDB Endowment
Diversification and refinement in collaborative filtering recommender

Proceedings of the 20th ACM international conference on Information and knowledge management
Skimmer: rapid scrolling of relational query results

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Dynamic diversification of continuous data

Proceedings of the 15th International Conference on Extending Database Technology
DisC diversity: result diversification based on dissimilarity and coverage

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the "best" results appear first. However, standard database query results comprise a set of tuples, with no associated ranking. It is typical to allow users the ability to sort results on selected attributes, but no actual ranking is defined. An alternative approach to the first page is not to try to show the best results, but instead to help users learn what is available in the whole result set and direct them to finding what they need. In this paper, we demonstrate through a user study that a page comprising one representative from each of k clusters (generated through a k-medoid clustering) is superior to multiple alternative candidate methods for generating representatives of a data set. Users often refine query specifications based on returned results. Traditional clustering may lead to completely new representatives after a refinement step. Furthermore, clustering can be computationally expensive. We propose a tree-based method for efficiently generating the representatives, and smoothly adapting them with query refinement. Experiments show that our algorithms outperform the state-of-the-art in both result quality and efficiency.