Automatic query reformulations for text retrieval in software engineering

  • Authors:
  • Sonia Haiduc;Gabriele Bavota;Andrian Marcus;Rocco Oliveto;Andrea De Lucia;Tim Menzies

  • Affiliations:
  • Wayne State University, USA;University of Salerno, Italy;Wayne State University, USA;University of Molise, Italy;University of Salerno, Italy;University of West Virginia, USA

  • Venue:
  • Proceedings of the 2013 International Conference on Software Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place. We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).