A study of academic collaboration in computational linguistics with latent mixtures of authors

Authors:
Nikhil Johri;Daniel Ramage;Daniel A. McFarland;Daniel Jurafsky
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Year:
2011

Citing 9
Cited 3

Visualizing a discipline: an author co-citation analysis of information science, 1972–1995

Journal of the American Society for Information Science
Latent dirichlet allocation

The Journal of Machine Learning Research
Analysis of SIGMOD's co-authorship graph

ACM SIGMOD Record
The author-topic model for authors and documents

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
Studying the history of ideas using topic models

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
The ACL Anthology Network corpus

NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Co-authorship networks in the digital library research community

Information Processing and Management: an International Journal - Special issue: Infometrics

Stylometric analysis of scientific articles

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Discovering factions in the computational linguistics community

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Towards an ACL anthology corpus with logical document structure: an overview of the ACL 2012 contributed task

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

Quantified Score

Hi-index	0.01

Visualization

Abstract

Academic collaboration has often been at the forefront of scientific progress, whether amongst prominent established researchers, or between students and advisors. We suggest a theory of the different types of academic collaboration, and use topic models to computationally identify these in Computational Linguistics literature. A set of author-specific topics are learnt over the ACL corpus, which ranges from 1965 to 2009. The models are trained on a per year basis, whereby only papers published up until a given year are used to learn that year's author topics. To determine the collaborative properties of papers, we use, as a metric, a function of the cosine similarity score between a paper's term vector and each author's topic signature in the year preceding the paper's publication. We apply this metric to examine questions on the nature of collaborations in Computational Linguistics research, finding that significant variations exist in the way people collaborate within different sub-fields.