The Journal of Machine Learning Research
Probabilistic dyadic data analysis with local and global consistency
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies
Journal of the ACM (JACM)
Information needs in bug reports: improving cooperation between developers and users
Proceedings of the 2010 ACM conference on Computer supported cooperative work
Software traceability with topic modeling
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering
How do developers blog?: an exploratory study
Proceedings of the 8th Working Conference on Mining Software Repositories
How do programmers ask and answer questions on the web? (NIER track)
Proceedings of the 33rd International Conference on Software Engineering
Comparing twitter and traditional media using topic models
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Concern Localization using Information Retrieval: An Empirical Study on Linux Kernel
WCRE '11 Proceedings of the 2011 18th Working Conference on Reverse Engineering
Discriminative Topic Modeling Based on Manifold Learning
ACM Transactions on Knowledge Discovery from Data (TKDD)
Automatic categorization of bug reports using latent Dirichlet allocation
Proceedings of the 5th India Software Engineering Conference
Social coding in GitHub: transparency and collaboration in an open software repository
Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging
ICSM '12 Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM)
An empirical analysis of a network of expertise
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
StackOverflow provides a popular platform where developers post and answer questions. Recently, Treude et al. manually label 385 questions in StackOverflow and group them into 10 categories based on their contents. They also analyze how tags are used in StackOverflow. In this study, we extend their work to obtain a deeper understanding on how developers interact with one another on such a question and answer web site. First, we analyze the distributions of developers who ask and answer questions. We also investigate if there is a segregation of the StackOverflow community into questioners and answerers. We also perform automated text mining to find the various kinds of topics asked by developers. We use Latent Dirichlet Allocation (LDA), a well known topic modeling approach, to analyze the contents of tens of thousands of questions and answers, and produce five topics. Our topic modeling strategy provides an alternative perspective different from that of Treude et al. for categorizing StackOverflow questions. Each question can now be categorized into several topics with different probabilities, and the learned topic model could automatically assign a new question to several categories with varying probabilities. Last but not least, we show the distributions of questions and developers belonging to various topics generated by LDA.