Optimisation of the decision tree technique applied to simulated sow herd datasets

  • Authors:
  • K. Kirchner;K. -H. Tölle;J. Krieter

  • Affiliations:
  • Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Hermann-Rodewald-Straíe 6, 24118 Kiel, Germany;Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Hermann-Rodewald-Straíe 6, 24118 Kiel, Germany;Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Hermann-Rodewald-Straíe 6, 24118 Kiel, Germany

  • Venue:
  • Computers and Electronics in Agriculture
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this study, the robustness of the C4.5-decision tree algorithm was applied to sow herd datasets for investigating the limitations of this analysing technique. First, simulated sow herd datasets including inconsistent farmer culling policies which appear in real datasets were classified. These results were compared with very uniform and simple replacement rules. Furthermore, an optimisation of different pruning methods which can be changed in the decision tree tool was done. The evaluation parameters of all classifications were calculated with the stratified fold cross-validation and varying the number of folds showed that 10 folds were an appropriate number of subdividing the datasets. By simplifying the sow selection in the simulation, the sensitivity and error rate of the datasets showed improved values. In particular, datasets with randomly selected and inconsistent culling rules showed less sensitivities between 20 and 53%. A comparison of the classification of datasets with pruning or without pruning showed that with pruning, smaller sizes of trees resulted. The pruning class had a decrease of 23 leaves and 46 nodes, as compared to the without-pruning class, in the highest branching example. Differences exist between the two different pruning methods in the classification parameters and also in the tree size in dependence of the sow herd performance level and the size of the datasets.