Fault-prone module detection using large-scale text features based on spam filtering

Authors:
Hideaki Hata;Osamu Mizuno;Tohru Kikuno
Affiliations:
Graduate School of Information Science and Technology, Osaka University, Osaka, Japan;Graduate School of Information Science and Technology, Kyoto Institute of Technology, Kyoto, Japan;Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
Venue:
Empirical Software Engineering
Year:
2010

Citing 38
Cited 3

A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Refactoring: improving the design of existing code

Refactoring: improving the design of existing code
Predicting Fault Incidence Using Software Change History

IEEE Transactions on Software Engineering
Elements of Software Science (Operating and programming systems series)

Elements of Software Science (Operating and programming systems series)
An empirical evaluation of fault-proneness models

Proceedings of the 24th International Conference on Software Engineering
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
Assessing the applicability of fault-proneness models across object-oriented software projects

IEEE Transactions on Software Engineering
A complexity measure

ICSE '76 Proceedings of the 2nd international conference on Software engineering
A Taxonomy and an Initial Empirical Study of Bad Smells in Code

ICSM '03 Proceedings of the International Conference on Software Maintenance
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Empirical Software Engineering
Use of relative code churn measures to predict system defect density

Proceedings of the 27th international conference on Software engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Comparing Fault-Proneness Estimation Models

ICECCS '05 Proceedings of the 10th IEEE International Conference on Engineering of Complex Computer Systems
Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques

IEEE Transactions on Software Engineering
HATARI: raising risk awareness

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
DynaMine: finding common error patterns by mining software revision histories

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
When do changes induce fixes?

MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
The Top Ten List: Dynamic Fault Prediction

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction

IEEE Transactions on Software Engineering
Analyzing Software Quality with Limited Fault-Proneness Defect Data

HASE '05 Proceedings of the Ninth IEEE International Symposium on High-Assurance Systems Engineering
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Predicting component failures at design time

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Predicting Faults from Cached History

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Spam Filter Based Approach for Finding Fault-Prone Software Modules

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Training on errors experiment to detect fault-prone software modules by spam filter

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Learning from bug-introducing changes to prevent fault prone code

Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
Predicting vulnerable software components

Proceedings of the 14th ACM conference on Computer and communications security
Predicting buggy changes inside an integrated development environment

Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange
On the relation of refactorings and software defect prediction

Proceedings of the 2008 international working conference on Mining software repositories
Towards a simplification of the bug report form in eclipse

Proceedings of the 2008 international working conference on Mining software repositories
Predicting fault-prone modules based on metrics transitions

DEFECTS '08 Proceedings of the 2008 workshop on Defects in large software systems
Classifying Software Changes: Clean or Buggy?

IEEE Transactions on Software Engineering
Iterative identification of fault-prone binaries using in-process metrics

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Project-specific deletion patterns

Proceedings of the 2008 international workshop on Recommendation systems for software engineering
Toward an understanding of bug fix patterns

Empirical Software Engineering

An integrated approach to detect fault-prone modules using complexity and text feature metrics

AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
The code orb: supporting contextualized coding via at-a-glance views (NIER track)

Proceedings of the 33rd International Conference on Software Engineering
Historage: fine-grained version control system for Java

Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an approach using large-scale text features for fault-prone module detection inspired by spam filtering. The number of every text feature in the source code of a module is counted and used as data for training detection models. In this paper, we prepared a naive Bayes classifier and a logistic regression model as detection models. To show the effectiveness of our approaches, we conducted experiments with five open source projects and compared them with a well-known metrics set, thereby achieving higher detection results. The results imply that large-scale text features are useful in constructing practical detection models, and measuring sophisticated metrics is not always necessary for detecting fault-prone modules.