Ad-hoc aggregations of ranked lists in the presence of hierarchies

Authors:
Nilesh Bansal;Sudipto Guha;Nick Koudas
Affiliations:
University of Toronto, Toronto, ON, Canada;University of Pennsylvania, Philadelphia, PA, USA;University of Toronto, Toronto, ON, Canada
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 17
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Fast algorithms for hierarchical range histogram construction

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Comparing Top k Lists

SIAM Journal on Discrete Mathematics
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
Visualizing tags over time

Proceedings of the 15th international conference on World Wide Web
Ranking objects based on relationships

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Answering top-k queries using views

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Seeking stable clusters in the blogosphere

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anytime measures for top-k algorithms

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
BlogScope: a system for online analysis of high volume text streams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Learning to order things

Journal of Artificial Intelligence Research

Ranking objects based on relationships and fixed associations

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Skip-and-prune: cosine-based top-k query processing for efficient context-sensitive document retrieval

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Promotion analysis in multi-dimensional space

Proceedings of the VLDB Endowment
Efficient and generic evaluation of ranked queries

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

A variety of web sites and web based services produce textual lists at varying time granularities ranked according to several criteria. For example, Google Trends produces lists of popular query keywords which can be visualized according to several criteria. At Flickr, lists of popular tags used to tag the images uploaded can be visualized as a cloud based on their popularity. Identification of the k most popular terms can be easily conducted by utilizing well known rank aggregation algorithms. In this paper we take a different approach to information discovery from such ranked lists. We maintain the same rank aggregation framework but we elevate terms at a higher level by making use of popular term hierarchies commonly available. Under such a transformation we show that typical early stopping certificates available for rank aggregation algorithms are no longer applicable. Based on this observation, in this paper, we present a probabilistic framework for early stopping in this setting. We introduce a relaxed version of the rank aggregation problem involving a deterministic stopping condition with user specified precision. We introduce an algorithm pH -- RA for the solution of this problem. In addition we introduce techniques to improve the performance of pH -- RAeven further via precomputation utilizing a sparse set system. Through a detailed experimental evaluation using synthetic and real datasets we demonstrate the efficiency of our framework.