A study of academic collaboration in computational linguistics with latent mixtures of authors

  • Authors:
  • Nikhil Johri;Daniel Ramage;Daniel A. McFarland;Daniel Jurafsky

  • Affiliations:
  • Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA

  • Venue:
  • LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Academic collaboration has often been at the forefront of scientific progress, whether amongst prominent established researchers, or between students and advisors. We suggest a theory of the different types of academic collaboration, and use topic models to computationally identify these in Computational Linguistics literature. A set of author-specific topics are learnt over the ACL corpus, which ranges from 1965 to 2009. The models are trained on a per year basis, whereby only papers published up until a given year are used to learn that year's author topics. To determine the collaborative properties of papers, we use, as a metric, a function of the cosine similarity score between a paper's term vector and each author's topic signature in the year preceding the paper's publication. We apply this metric to examine questions on the nature of collaborations in Computational Linguistics research, finding that significant variations exist in the way people collaborate within different sub-fields.