Author cocitation analysis and Pearson's r

  • Authors:
  • Howard D. White

  • Affiliations:
  • College of Information Science and Technology, Drexel University, 3152 Chestnut Street, Philadelphia, PA

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In their article "Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient," Ahlgren, Jarneving, and Rousseau fault traditional author cocitation analysis (ACA) for using Pearson's r as a measure of similarity between authors because it fails two tests of stability of measurement. The instabilities arise when rs are recalculated after a first coherent group of authors has been augmented by a second coherent group with whom the first has little or no cocitation. However, AJ&R neither cluster nor map, their data to demonstrate how fluctuations in rs will mislead the analyst, and the problem they pose is remote from both theory and practice in traditional ACA. By entering their own rs into multidimensional scaling and clustering routines, I show that, despite r's fluctuations, clusters based on it are much the same for the combined groups as for the separate groups. The combined groups when mapped appear as polarized clumps of points in two-dimensional space, confirming that differences between the groups have become much more important than differences within the groups--an accurate portrayal of what has happened to the data. Moreover, r produces clusters and maps very like those based on other coefficients that AJ&R mention as possible replacements, such as a cosine similarity measure or a chi square dissimilarity measure. Thus, r performs well enough for the purposes of ACA. Accordingly, I argue that qualitative information revealing why authors are cocited is more important than the cautions proposed in the AJ&R critique. I include notes on topics such as handling the diagonal in author cocitation matrices, lognormalizing data, and testing r for significance.