Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Modern Information Retrieval
Recovering documentation-to-source-code traceability links using latent semantic indexing
Proceedings of the 25th International Conference on Software Engineering
The Journal of Machine Learning Research
Introduction to Clustering Large and High-Dimensional Data
Introduction to Clustering Large and High-Dimensional Data
Detection of Duplicate Defect Reports Using Natural Language Processing
ICSE '07 Proceedings of the 29th international conference on Software Engineering
IEEE Transactions on Software Engineering
Mining Eclipse Developer Contributions via Author-Topic Models
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Fast collapsed gibbs sampling for latent dirichlet allocation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A theory of aspects as latent topics
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Using information retrieval based coupling measures for impact analysis
Empirical Software Engineering
An information retrieval process to aid in the analysis of code clones
Empirical Software Engineering
Software traceability with topic modeling
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Bug localization using latent Dirichlet allocation
Information and Software Technology
On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery
ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension
Validating the Use of Topic Models for Software Evolution
SCAM '10 Proceedings of the 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation
Estimating the Optimal Number of Latent Concepts in Source Code Analysis
SCAM '10 Proceedings of the 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation
Using Relational Topic Models to capture coupling among classes in object-oriented software systems
ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
On parameter tuning in search based software engineering
SSBSE'11 Proceedings of the Third international conference on Search based software engineering
On integrating orthogonal information retrieval methods to improve traceability recovery
ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
A topic-based approach for narrowing the search space of buggy files from a bug report
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Integrated impact analysis for managing software changes
Proceedings of the 34th International Conference on Software Engineering
On the naturalness of software
Proceedings of the 34th International Conference on Software Engineering
Empirical Software Engineering
A dataset from change history to support evaluation of software maintenance tasks
Proceedings of the 10th Working Conference on Mining Software Repositories
Searching for better configurations: a rigorous approach to clone evaluation
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Improving trace accuracy through data-driven configuration and composition of tracing features
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Empirical answers to fundamental software engineering problems (panel)
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Improving software modularization via automated analysis of latent topics and dependencies
ACM Transactions on Software Engineering and Methodology (TOSEM)
Hi-index | 0.00 |
Information Retrieval (IR) methods, and in particular topic models, have recently been used to support essential software engineering (SE) tasks, by enabling software textual retrieval and analysis. In all these approaches, topic models have been used on software artifacts in a similar manner as they were used on natural language documents (e.g., using the same settings and parameters) because the underlying assumption was that source code and natural language documents are similar. However, applying topic models on software data using the same settings as for natural language text did not always produce the expected results. Recent research investigated this assumption and showed that source code is much more repetitive and predictable as compared to the natural language text. Our paper builds on this new fundamental finding and proposes a novel solution to adapt, configure and effectively use a topic modeling technique, namely Latent Dirichlet Allocation (LDA), to achieve better (acceptable) performance across various SE tasks. Our paper introduces a novel solution called LDA-GA, which uses Genetic Algorithms (GA) to determine a near-optimal configuration for LDA in the context of three different SE tasks: (1) traceability link recovery, (2) feature location, and (3) software artifact labeling. The results of our empirical studies demonstrate that LDA-GA is ableto identify robust LDA configurations, which lead to a higher accuracy on all the datasets for these SE tasks as compared to previously published results, heuristics, and the results of a combinatorial search.