Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging

  • Authors:
  • Shaowei Wang;David Lo;Lingxiao Jiang

  • Affiliations:
  • School of Information Systems, Singapore Management University;School of Information Systems, Singapore Management University;School of Information Systems, Singapore Management University

  • Venue:
  • ICSM '12 Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many software engineering tasks, such as feature location and duplicate bug report detection, leverages similarities among textual corpora. However, due to the different words used by developers to express the same concept, exact matching of words is insufficient. One document can contain a particular word while the other document may contain another word that is semantically related but is not the same. Such word differences may cause inaccuracies in subsequent software engineering tasks. Recently, tagging has impacted the software engineering community. Developers increasingly use tags to describe important features of a software product. Many project hosting sites allow users to tag various projects with their own words. It becomes increasingly important to understand and relate these tags. Based on the tags available from software project hosting websites, we propose a similarity metric to infer semantically related terms, each of which is a tag, and build a taxonomy that could further describe the relationships among these terms. We have built a sample taxonomy from tens of thousands of projects and their tags. Our user studies show that our proposed similarity metric for tags are indeed related to the semantic similarity of the terms, and the resultant semantic taxonomy among terms is reasonably good.