Parameterizing random test data according to equivalence classes

Authors:
Christian Murphy;Gail Kaiser;Marta Arias
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY
Venue:
Proceedings of the 2nd international workshop on Random testing: co-located with the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2007)
Year:
2007

Citing 10
Cited 1

Automated Software Test Data Generation

IEEE Transactions on Software Engineering
Constraint-Based Automatic Test Data Generation

IEEE Transactions on Software Engineering
The nature of statistical learning theory

The nature of statistical learning theory
Making large-scale support vector machine learning practical

Advances in kernel methods
Parameterized random testing

DAC '84 Proceedings of the 21st Design Automation Conference
Pseudo-oracles for non-testable programs

ACM '81 Proceedings of the ACM '81 conference
Real-time ranking with concept drift using expert advice

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting electricity distribution feeder failures using machine learning susceptibility analysis

IAAI'06 Proceedings of the 18th conference on Innovative applications of artificial intelligence - Volume 2
Automatic generation of random self-checking test cases

IBM Systems Journal
Martingale boosting

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Automatic system testing of programs without test oracles

Proceedings of the eighteenth international symposium on Software testing and analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are concerned with the problem of detecting bugs in machine learning applications. In the absence of sufficient real-world data, creating suitably large data sets for testing can be a difficult task. To address this problem, we have developed an approach to creating data sets called "parameterized random data generation". Our data generation framework allows us to isolate or combine different equivalence classes as desired, and then randomly generate large data sets using the properties of those equivalence classes as parameters. This allows us to take advantage of randomness but still have control over test case selection at the system testing level. We present our findings from using the approach to test two different machine learning ranking applications.