Using medians to generate consensus rankings for biological data

  • Authors:
  • Sarah Cohen-Boulakia;Alain Denise;Sylvie Hamel

  • Affiliations:
  • Laboratoire de Recherche en Informatique, CNRS, UMR, Université Paris-Sud, France and AMIB Group, INRIA Saclay Ile-de-France, France;Laboratoire de Recherche en Informatique, CNRS, UMR, Université Paris-Sud, France and AMIB Group, INRIA Saclay Ile-de-France, France and Institut de Génétique et de Microbiologie;Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, QC, Canada

  • Venue:
  • SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Faced with the deluge of data available in biological databases, it becomes increasingly difficult for scientists to obtain reasonable sets of answers to their biological queries. A critical example appears in medicine, where physicians frequently need to get information about genes associated with a given disease. When they pose such queries to Web portals (e.g., Entrez NCBI) they usually get huge amounts of answers which are not ranked, making them very difficult to be exploited. In the last years, while several ranking approaches have been proposed, none of them is considered as the most promising. Instead of considering ranking methods as alternative approaches, we propose to generate a consensus ranking to highlight the common points of a set of rankings while minimizing their disagreements. Our work is based on the concept of median, originally defined on permutations: Given m permutations and a distance function, the median problem is to find a permutation that is the closest of the m given permutations. We have investigated the problem of computing a median of a set of m rankings considering different elements and ties, under a generalized Kendall-τ distance. This problem is known to be NP-hard. In this paper, we present a new heuristic for the problem and we demonstrate the benefit of our approach on real queries using four different ranking methods.