Testing noisy numerical data for monotonic association

  • Authors:
  • Ulrich Bodenhofer;Martin Krone;Frank Klawonn

  • Affiliations:
  • Institute of Bioinformatics, Johannes Kepler University, 4040 Linz, Austria;Department of Computer Science, Ostfalia University of Applied Sciences, 38302 Wolfenbüttel, Germany;Department of Computer Science, Ostfalia University of Applied Sciences, 38302 Wolfenbüttel, Germany and Bioinformatics and Statistics, Helmholtz Centre for Infection Research, 38124 Braunsch ...

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Rank correlation measures are intended to measure to which extent there is a monotonic association between two observables. While they are mainly designed for ordinal data, they are not ideally suited for noisy numerical data. In order to better account for noisy data, a family of rank correlation measures has previously been introduced that replaces classical ordering relations by fuzzy relations with smooth transitions-thereby ensuring that the correlation measure is continuous with respect to the data. The given paper briefly repeats the basic concepts behind this family of rank correlation measures and investigates it from the viewpoint of robust statistics. Then, on this basis, we introduce a framework of novel rank correlation tests. An extensive experimental evaluation using a large number of simulated data sets is presented which demonstrates that the new tests indeed outperform the classical variants in terms of type II error rates without sacrificing good performance in terms of type I error rates. This is mainly due to the fact that the new tests are more robust to noise for small samples. The Gaussian rank correlation estimator turned out to be the best choice in situations where no prior knowledge is available about the data, whereas the new family of robust gamma test provides an advantage in situations where information about the noise distribution is available. An implementation of all robust rank correlation tests used in this paper is available as an R package from the CRAN repository.