Can Better Identifier Splitting Techniques Help Feature Location?

Authors:
Bogdan Dit;Latifa Guerrouj;Denys Poshyvanyk;Giuliano Antoniol
Affiliations:
-;-;-;-
Venue:
ICPC '11 Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension
Year:
2011

Citing 0
Cited 6

Integrated impact analysis for managing software changes

Proceedings of the 34th International Conference on Software Engineering
Concept location using formal concept analysis and information retrieval

ACM Transactions on Software Engineering and Methodology (TOSEM)
Semantic fault diagnosis: automatic natural-language fault descriptions

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Normalizing source code vocabulary to support program comprehension and software quality

Proceedings of the 2013 International Conference on Software Engineering
A dataset from change history to support evaluation of software maintenance tasks

Proceedings of the 10th Working Conference on Mining Software Repositories
Improving software modularization via automated analysis of latent topics and dependencies

ACM Transactions on Software Engineering and Methodology (TOSEM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper presents an exploratory study of two feature location techniques utilizing three strategies for splitting identifiers: Camel Case, Samurai and manual splitting of identifiers. The main research question that we ask in this study is if we had a perfect technique for splitting identifiers, would it still help improve accuracy of feature location techniques applied in different scenarios and settings? In order to answer this research question we investigate two feature location techniques, one based on Information Retrieval and the other one based on the combination of Information Retrieval and dynamic analysis, for locating bugs and features using various configurations of preprocessing strategies on two open-source systems, Rhino and jEdit. The results of an extensive empirical evaluation reveal that feature location techniques using Information Retrieval can benefit from better preprocessing algorithms in some cases, and that their improvement in effectiveness while using manual splitting over state-of-the-art approaches is statistically significant in those cases. However, the results for feature location technique using the combination of Information Retrieval and dynamic analysis do not show any improvement while using manual splitting, indicating that any preprocessing technique will suffice if execution data is available. Overall, our findings outline potential benefits of putting additional research efforts into defining more sophisticated source code preprocessing techniques as they can still be useful in situations where execution information cannot be easily collected.