Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis

  • Authors:
  • R. W. Selby;A. A. Porter

  • Affiliations:
  • Univ. of California, Irvine;Univ. of California, Irvine

  • Venue:
  • IEEE Transactions on Software Engineering - Special Issue on Artificial Intelligence in Software Applications
  • Year:
  • 1988

Quantified Score

Hi-index 0.00

Visualization

Abstract

A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for one problem domain, specifically, that of software resource data analysis. The purpose of the decision trees is to identify classes of objects (software modules) that had high development effort, i.e. in the uppermost quartile relative to past data. Sixteen software systems ranging from 3000 to 112000 source lines have been selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4700 objects, capture a multitude of information about the objects: development effort, faults, changes, design style, and implementation style. A total of 9600 decision trees are automatically generated and evaluated. The analysis focuses on the characterization and evaluation of decision tree accuracy, complexity, and composition. The decision trees correctly identified 79.3% of the software modules that had high development effort or faults, on the average across all 9600 trees. The decision trees generated from the best parameter combinations correctly identified 88.4% of the modules on the average. Visualization of the results is emphasized, and sample decision trees are included.