Can Better Identifier Splitting Techniques Help Feature Location?

  • Authors:
  • Bogdan Dit;Latifa Guerrouj;Denys Poshyvanyk;Giuliano Antoniol

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICPC '11 Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents an exploratory study of two feature location techniques utilizing three strategies for splitting identifiers: Camel Case, Samurai and manual splitting of identifiers. The main research question that we ask in this study is if we had a perfect technique for splitting identifiers, would it still help improve accuracy of feature location techniques applied in different scenarios and settings? In order to answer this research question we investigate two feature location techniques, one based on Information Retrieval and the other one based on the combination of Information Retrieval and dynamic analysis, for locating bugs and features using various configurations of preprocessing strategies on two open-source systems, Rhino and jEdit. The results of an extensive empirical evaluation reveal that feature location techniques using Information Retrieval can benefit from better preprocessing algorithms in some cases, and that their improvement in effectiveness while using manual splitting over state-of-the-art approaches is statistically significant in those cases. However, the results for feature location technique using the combination of Information Retrieval and dynamic analysis do not show any improvement while using manual splitting, indicating that any preprocessing technique will suffice if execution data is available. Overall, our findings outline potential benefits of putting additional research efforts into defining more sophisticated source code preprocessing techniques as they can still be useful in situations where execution information cannot be easily collected.