A note on measuring overlap

  • Authors:
  • L. Egghe;M. Goovaerts

  • Affiliations:
  • Universiteit Hasselt, Diepenbeek, Belgium Universiteit Antwerpen (UA), Campus Drie Eiken, Wilrijk, Belgium;Universiteit Hasselt, Diepenbeek, Belgium

  • Venue:
  • Journal of Information Science
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In measuring the overlap between two sets A and B (e.g. libraries, databases) one is obliged to calculate the overlap O(A|B) of A with respect to B (i.e. the fraction of elements of B that are also in A) and of O(B|A) of B with respect to A (i.e. the fraction of elements in A that are also in B). Theoretically this requires two samples. In this paper we explain that one sample can suffice to determine confidence intervals for both O(A|B) and O(B|A). The paper closes with the example of measuring the overlap between the secondary sources in mathematics MathSciNet and Zentralblatt MATH and with a remark on the estimation of the Jaccard index.