Fault-prone module detection using large-scale text features based on spam filtering

  • Authors:
  • Hideaki Hata;Osamu Mizuno;Tohru Kikuno

  • Affiliations:
  • Graduate School of Information Science and Technology, Osaka University, Osaka, Japan;Graduate School of Information Science and Technology, Kyoto Institute of Technology, Kyoto, Japan;Graduate School of Information Science and Technology, Osaka University, Osaka, Japan

  • Venue:
  • Empirical Software Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an approach using large-scale text features for fault-prone module detection inspired by spam filtering. The number of every text feature in the source code of a module is counted and used as data for training detection models. In this paper, we prepared a naive Bayes classifier and a logistic regression model as detection models. To show the effectiveness of our approaches, we conducted experiments with five open source projects and compared them with a well-known metrics set, thereby achieving higher detection results. The results imply that large-scale text features are useful in constructing practical detection models, and measuring sophisticated metrics is not always necessary for detecting fault-prone modules.