Using transitivity to increase the accuracy of sample-based Pearson correlation coefficients

  • Authors:
  • Taylor Phillips;Chris GauthierDickey;Ramki Thurimella

  • Affiliations:
  • University of Denver, Department of Computer Science;University of Denver, Department of Computer Science;University of Denver, Department of Computer Science

  • Venue:
  • DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pearson product-moment correlation coefficients are a well-practiced quantification of linear dependence seen across many fields. When calculating a sample-based correlation coefficient, the accuracy of the estimation is dependent on the quality and quantity of the sample. Like all statistical models, these correlation coefficients can suffer from overfitting, which results in the representation of random error instead of an underlying trend. In this paper, we discuss how Pearson product-moment correlation coefficients can utilize information outside of the two items for which the correlation is being computed. By introducing a transitive relationship with one or more additional items that meet specified criterion, our Transitive Pearson product-moment correlation coefficient can significantly reduce the error, up to over 50%, of sparse, sample-based estimations. Finally, we demonstrate that if the data is too dense or too sparse, transitivity is detrimental in reducing the correlation estimation errors.