Subset selection from multi-experiment data sets with application to milk fatty acid profiles

Authors:
Karolien Scheerlinck;Bernard De Baets;Ivan Stefanov;Veerle Fievez
Affiliations:
Department of Applied Mathematics, Biometrics and Process Control, Ghent University, Coupure links 653, 9000 Ghent, Belgium;Department of Applied Mathematics, Biometrics and Process Control, Ghent University, Coupure links 653, 9000 Ghent, Belgium;Department of Animal Production, Ghent University, Coupure links 653, 9000 Ghent, Belgium;Department of Animal Production, Ghent University, Coupure links 653, 9000 Ghent, Belgium
Venue:
Computers and Electronics in Agriculture
Year:
2010

Citing 2
Cited 2

Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability

Data & Knowledge Engineering
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation

Model fusion for prediction of apple firmness using hyperspectral scattering image

Computers and Electronics in Agriculture
Countering the negative search bias of ant colony optimization in subset selection problems

Computers and Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The development of routine analyses to allow for the handling of large amounts of samples and to avoid cost and time expensive analytical techniques is of high value. These routine analyses most often require calibration using the detailed analyses as reference values. A representative subset reflecting the complete range of the variables of interest is required for this purpose. In this paper this subset selection problem is tackled for multi-experiment data sets. Conventional techniques such as the Kennard and Stone algorithm and OptiSim are compared to a new approach based on Genetic Algorithms. The challenge here is to find an adequate objective function and to modify the standard crossover and mutation operators to keep the number of desired samples fixed. These techniques are applied on a data set containing the concentration of 45 fatty acids, determined by a simplified reference method, in 1033 milk samples, stemming from six different experiments. The objective is to select a subset of 100 samples in which each of the six different experiments is sufficiently represented. While there is no obvious way to generalize the conventional methods for multi-experiment data sets, this can quite easily be accomplished for Genetic Algorithms by modifying the objective function. Our results indicate that Genetic Algorithms are very capable of handling the subset selection problem for multi-experiment data sets.