Using transitivity to increase the accuracy of sample-based Pearson correlation coefficients

Authors:
Taylor Phillips;Chris GauthierDickey;Ramki Thurimella
Affiliations:
University of Denver, Department of Computer Science;University of Denver, Department of Computer Science;University of Denver, Department of Computer Science
Venue:
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Year:
2010

Citing 8
Cited 0

Using collaborative filtering to weave an information tapestry

Communications of the ACM - Special issue on information filtering
Item-based collaborative filtering recommendation algorithms

Proceedings of the 10th international conference on World Wide Web
Amazon.com Recommendations: Item-to-Item Collaborative Filtering

IEEE Internet Computing
TiVo: making show recommendations using a distributed collaborative filtering architecture

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
Lessons from the Netflix prize challenge

ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Factorization meets the neighborhood: a multifaceted collaborative filtering model

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pearson product-moment correlation coefficients are a well-practiced quantification of linear dependence seen across many fields. When calculating a sample-based correlation coefficient, the accuracy of the estimation is dependent on the quality and quantity of the sample. Like all statistical models, these correlation coefficients can suffer from overfitting, which results in the representation of random error instead of an underlying trend. In this paper, we discuss how Pearson product-moment correlation coefficients can utilize information outside of the two items for which the correlation is being computed. By introducing a transitive relationship with one or more additional items that meet specified criterion, our Transitive Pearson product-moment correlation coefficient can significantly reduce the error, up to over 50%, of sparse, sample-based estimations. Finally, we demonstrate that if the data is too dense or too sparse, transitivity is detrimental in reducing the correlation estimation errors.