Foundations for the study of software architecture
ACM SIGSOFT Software Engineering Notes
Supporting program comprehension using semantic and structural information
ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
A mathematical theory of communication
ACM SIGMOBILE Mobile Computing and Communications Review
The evolution matrix: recovering software evolution using software visualization techniques
IWPSE '01 Proceedings of the 4th International Workshop on Principles of Software Evolution
Recovering documentation-to-source-code traceability links using latent semantic indexing
Proceedings of the 25th International Conference on Software Engineering
The Journal of Machine Learning Research
An Information Retrieval Approach to Concept Location in Source Code
WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
ICML '06 Proceedings of the 23rd international conference on Machine learning
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Representing concerns in source code
ACM Transactions on Software Engineering and Methodology (TOSEM)
Semantic clustering: Identifying topics in source code
Information and Software Technology
Tool-Supported Refactoring of Existing Object-Oriented Code into Aspects
IEEE Transactions on Software Engineering
Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code
ICPC '07 Proceedings of the 15th IEEE International Conference on Program Comprehension
Mining business topics in source code using latent dirichlet allocation
ISEC '08 Proceedings of the 1st India software engineering conference
Fast collapsed gibbs sampling for latent dirichlet allocation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On the Use of Domain Terms in Source Code
ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation
WCRE '08 Proceedings of the 2008 15th Working Conference on Reverse Engineering
A theory of aspects as latent topics
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Measuring the Impact of Different Categories of Software Evolution
IWSM/Metrikon/Mensura '08 Proceedings of the International Conferences on Software Process and Product Measurement
An Application of Latent Dirichlet Allocation to Analyzing Software Evolution
ICMLA '08 Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications
Evaluation methods for topic models
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Using Latent Dirichlet Allocation for automatic categorization of software
MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Studying the history of ideas using topic models
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A survey of automated code-level aspect mining techniques
Transactions on aspect-oriented software development IV
Software traceability with topic modeling
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Using information fragments to answer the questions developers ask
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery
ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension
A two-step technique for extract class refactoring
Proceedings of the IEEE/ACM international conference on Automated software engineering
New Conceptual Coupling and Cohesion Metrics for Object-Oriented Systems
SCAM '10 Proceedings of the 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation
Validating the Use of Topic Models for Software Evolution
SCAM '10 Proceedings of the 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation
Using Relational Topic Models to capture coupling among classes in object-oriented software systems
ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
Revisiting common bug prediction findings using effort-aware models
ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
TopicXP: Exploring topics in source code using Latent Dirichlet Allocation
ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
Change Bursts as Defect Predictors
ISSRE '10 Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering
Normalizing Source Code Vocabulary
WCRE '10 Proceedings of the 2010 17th Working Conference on Reverse Engineering
Blending Conceptual and Evolutionary Couplings to Support Change Impact Analysis in Source Code
WCRE '10 Proceedings of the 2010 17th Working Conference on Reverse Engineering
Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques
CSMR '10 Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering
Modeling the evolution of topics in source code histories
Proceedings of the 8th Working Conference on Mining Software Repositories
Identifying method friendships to remove the feature envy bad smell (NIER track)
Proceedings of the 33rd International Conference on Software Engineering
CodeTopics: which topic am I coding now?
Proceedings of the 33rd International Conference on Software Engineering
Evaluating defect prediction approaches: a benchmark and an extensive comparison
Empirical Software Engineering
Hi-index | 0.00 |
Topic models are generative probabilistic models which have been applied to information retrieval to automatically organize and provide structure to a text corpus. Topic models discover topics in the corpus, which represent real world concepts by frequently co-occurring words. Recently, researchers found topics to be effective tools for structuring various software artifacts, such as source code, requirements documents, and bug reports. This research also hypothesized that using topics to describe the evolution of software repositories could be useful for maintenance and understanding tasks. However, research has yet to determine whether these automatically discovered topic evolutions describe the evolution of source code in a way that is relevant or meaningful to project stakeholders, and thus it is not clear whether topic models are a suitable tool for this task. In this paper, we take a first step towards evaluating topic models in the analysis of software evolution by performing a detailed manual analysis on the source code histories of two well-known and well-documented systems, JHotDraw and jEdit. We define and compute various metrics on the discovered topic evolutions and manually investigate how and why the metrics evolve over time. We find that the large majority (87%-89%) of topic evolutions correspond well with actual code change activities by developers. We are thus encouraged to use topic models as tools for studying the evolution of a software system.