The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Supporting change request assignment in open source development
Proceedings of the 2006 ACM symposium on Applied computing
Predicting component failures at design time
Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Mining concepts from code with probabilistic topic models
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
A theory of aspects as latent topics
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Using information retrieval based coupling measures for impact analysis
Empirical Software Engineering
Sourcerer: mining and searching internet-scale software repositories
Data Mining and Knowledge Discovery
IWSM '09 /Mensura '09 Proceedings of the International Conferences on Software Process and Product Measurement
Modeling the evolution of topics in source code histories
Proceedings of the 8th Working Conference on Mining Software Repositories
Analyzing and mining a code search engine usage log
Empirical Software Engineering
Proceedings of the 2013 International Conference on Software Engineering
Using topic models to understand the evolution of a software ecosystem
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Sourcerer: An infrastructure for large-scale collection and analysis of open-source code
Science of Computer Programming
Hi-index | 0.00 |
We present the results of applying statistical author-topic models to a subset of the Eclipse 3.0 source code consisting of 2,119 source files and 700,000 lines of code from 59 developers. This technique provides an intuitive and automated framework with which to mine developer contributions and competencies from a given code base while simultaneously extracting software function in the form of topics. In addition to serving as a convenient summary for program function and developer activities, our study shows that topic models provide a meaningful, effective, and statistical basis for developer similarity analysis.