Testing and validating machine learning classifiers by metamorphic testing

  • Authors:
  • Xiaoyuan Xie;Joshua W. K. Ho;Christian Murphy;Gail Kaiser;Baowen Xu;Tsong Yueh Chen

  • Affiliations:
  • Centre for Software Analysis and Testing, Swinburne University of Technology, Hawthorn, Vic. 3122, Australia and School of Computer Science and Engineering, Southeast University, Nanjing 210096, C ...;Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA;Department of Computer Science, Columbia University, New York, NY 10027, USA and Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19103, USA;Department of Computer Science, Columbia University, New York, NY 10027, USA;State Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China;Centre for Software Analysis and Testing, Swinburne University of Technology, Hawthorn, Vic. 3122, Australia

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: Machine learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no ''test oracle'' to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique ''metamorphic testing'', which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program.