A machine learning approach for tracing regulatory codes to product specific requirements

  • Authors:
  • Jane Cleland-Huang;Adam Czauderna;Marek Gibiec;John Emenecker

  • Affiliations:
  • DePaul University, Chicago;DePaul University, Chicago;DePaul University, Chicago;DePaul University, Chicago

  • Venue:
  • Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Regulatory standards, designed to protect the safety, security, and privacy of the public, govern numerous areas of software intensive systems. Project personnel must therefore demonstrate that an as-built system meets all relevant regulatory codes. Current methods for demonstrating compliance rely either on after-the-fact audits, which can lead to significant refactoring when regulations are not met, or else require analysts to construct and use traceability matrices to demonstrate compliance. Manual tracing can be prohibitively time-consuming; however automated trace retrieval methods are not very effective due to the vocabulary mismatches that often occur between regulatory codes and product level requirements. This paper introduces and evaluates two machine-learning methods, designed to improve the quality of traces generated between regulatory codes and product level requirements. The first approach uses manually created traceability matrices to train a trace classifier, while the second approach uses web-mining techniques to reconstruct the original trace query. The techniques were evaluated against security regulations from the USA government's Health Insurance Privacy and Portability Act (HIPAA) traced against ten healthcare related requirements specifications. Results demonstrated improvements for the subset of HIPAA regulations that exhibited high fan-out behavior across the requirements datasets.