Discovering bucket orders from full rankings

Authors:
Jianlin Feng;Qiong Fang;Wilfred Ng
Affiliations:
Huazhong University of Science and Technology, Wuhan, China;The Hong Kong University of Science and Technology, Hong Kong, Hong Kong;The Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 13
Cited 4

Learning to order things

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
On the approximation of curves by line segments using dynamic programming

Communications of the ACM
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Time Series Segmentation for Context Recognition in Mobile Devices

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficient similarity search and classification via rank aggregation

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Comparing and aggregating rankings with ties

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Aggregating inconsistent information: ranking and clustering

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Finding partial orders from unordered 0-1 data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Algorithms for discovering bucket orders from data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Partial Orders in Binary Data

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Rank Aggregation for Automatic Schema Matching

IEEE Transactions on Knowledge and Data Engineering
Deterministic pivoting algorithms for constrained ranking and clustering problems

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Aggregation of partial rankings, p-ratings and top-m lists

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms

Developing Preference Band Model to Manage Collective Preferences

ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Generating labels from clicks

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Discovering significant relaxed order-preserving submatrices

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Rank quantization

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovering a bucket order B from a collection of possibly noisy full rankings is a fundamental problem that relates to various applications involving rankings. Informally, a bucket order is a total order that allows "ties" between items in a bucket. A bucket order B can be viewed as a "representative" that summarizes a given set of full rankings {T1, T2, ..., Tm}, or conversely B can be an "approximation" of some "ground truth" G where the rankings {T1, T2, ..., Tm} are simply the "linear extensions" of G. Current work of finding bucket orders such as the dynamic programming algorithm is mainly developed from the "representative" perspective, which maximizes items' intra-bucket similarity when forming a bucket. The underlying idea of maximizing intra-bucket similarity is realized via minimizing the sum of the deviations of median ranks within a bucket. In contrast, from the "approximation" perspective, since each observed full ranking Ti is simply a linear extension of the given "ground truth" bucket order G, items in a big bucket b in G are forced to have different median ranks, and as a result b will have a big sum of deviations. Thus, minimizing the sum of deviations may result in an undesirable scenario that big buckets are mostly decomposed into small ones. In this paper, we propose a novel heuristic called Abnormal Rank Gap to capture the inter-bucket dissimilarity for better bucket forming. In addition, we propose to use the "closeness" on multiple quantile ranks to determine if two items should be put into the same bucket. We develop a novel bucket order discovering method termed the Bucket Gap algorithm. Our extensive experiments demonstrate that the Bucket Gap algorithm significantly outperforms the major related work, i.e., the Bucket Pivot algorithm. In particular, the error distance of the generated bucket order can be reduced by about 30% on a real paleontological dataset and the noise tolerance can be increased from 30% to 50% in the synthetic dataset.