Using medians to generate consensus rankings for biological data

Authors:
Sarah Cohen-Boulakia;Alain Denise;Sylvie Hamel
Affiliations:
Laboratoire de Recherche en Informatique, CNRS, UMR, Université Paris-Sud, France and AMIB Group, INRIA Saclay Ile-de-France, France;Laboratoire de Recherche en Informatique, CNRS, UMR, Université Paris-Sud, France and AMIB Group, INRIA Saclay Ile-de-France, France and Institut de Génétique et de Microbiologie;Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, QC, Canada
Venue:
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Year:
2011

Citing 12
Cited 0

Generating functionology

Generating functionology
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Comparing and aggregating rankings with ties

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Aggregating inconsistent information: ranking and clustering

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
How to rank with few errors

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
BioGuideSRS

Bioinformatics
Fixed-Parameter Algorithms for Kemeny Scores

AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
Flexible and efficient querying and ranking on hyperlinked data sources

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Aggregation of Partial Rankings, p-Ratings and Top-m Lists

Algorithmica
What's new? what's certain? - scoring search results in the presence of overlapping data sources

DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Deterministic algorithms for rank aggregation and other ranking and clustering problems

WAOA'07 Proceedings of the 5th international conference on Approximation and online algorithms
Gene List significance at-a-glance with GeneValorization

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Faced with the deluge of data available in biological databases, it becomes increasingly difficult for scientists to obtain reasonable sets of answers to their biological queries. A critical example appears in medicine, where physicians frequently need to get information about genes associated with a given disease. When they pose such queries to Web portals (e.g., Entrez NCBI) they usually get huge amounts of answers which are not ranked, making them very difficult to be exploited. In the last years, while several ranking approaches have been proposed, none of them is considered as the most promising. Instead of considering ranking methods as alternative approaches, we propose to generate a consensus ranking to highlight the common points of a set of rankings while minimizing their disagreements. Our work is based on the concept of median, originally defined on permutations: Given m permutations and a distance function, the median problem is to find a permutation that is the closest of the m given permutations. We have investigated the problem of computing a median of a set of m rankings considering different elements and ties, under a generalized Kendall-τ distance. This problem is known to be NP-hard. In this paper, we present a new heuristic for the problem and we demonstrate the benefit of our approach on real queries using four different ranking methods.