Data Mining of Software Development Databases

  • Authors:
  • Taghi M. Khoshgoftaar;Edward B. Allen;Wendell D. Jones;John P. Hudepohl

  • Affiliations:
  • Florida Atlantic University, Boca Raton, Florida, USA taghi@cse.fau.edu;Mississippi State University, Mississippi, USA edward.allen@computer.org;IBM, P.O. Box 12195, 600 Park Office Drive, Research Triangle Park, North Carolina, USA wendellj@us.ibm.com;Nortel Networks, Research Triangle Park, North Carolina, USA hudepohl@nortelnetworks.com

  • Venue:
  • Software Quality Control
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software quality models can predict which modules will have high risk, enabling developers to target enhancement activities to the most problematic modules. However, many find collection of the underlying software product and process metrics a daunting task.Many software development organizations routinely use very large databases for project management, configuration management, and problem reporting which record data on events during development. These large databases can be an unintrusive source of data for software quality modeling. However, multiplied by many releases of a legacy system or a broad product line, the amount of data can overwhelm manual analysis. The field of data mining is developing ways to find valuable bits of information in very large databases. This aptly describes our software quality modeling situation.This paper presents a case study that applied data mining techniques to software quality modeling of a very large legacy telecommunications software system's configuration management and problem reporting databases. The case study illustrates how useful models can be built and applied without interfering with development.