A decision-tree-based alarming system for the validation of national genetic evaluations

  • Authors:
  • S. Diplaris;A. L. Symeonidis;P. A. Mitkas;G. Banos;Z. Abas

  • Affiliations:
  • Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece;Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece;Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece;Department of Animal Production, School of Veterinary Medicine, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece;Department of Agricultural Development, Democretus University of Thrace, Orestiada, Greece

  • Venue:
  • Computers and Electronics in Agriculture
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The aim of this work was to explore possibilities to build an alarming system based on the results of the application of data mining (DM) techniques in genetic evaluations of dairy cattle, in order to assess and assure data quality. The technique used combined data mining using classification and decision-tree algorithms, Gaussian binned fitting functions, and hypothesis tests. Data were quarterly national genetic evaluations, computed between February 1999 and February 2003 in nine countries. Each evaluation run included 73,000-90,000 bull records complete with their genetic values and evaluation information. Milk production traits were considered. Data mining algorithms were applied separately for each country and evaluation run to search for associations across several dimensions, including bull origin, type of proof, age of bull, and number of daughters. Then, data in each node were fitted to the Gaussian function and the quality of the fit was measured, thus providing a measure of the quality of data. In order to evaluate and ultimately predict decision-tree models, the implemented architecture can compare the node probabilities between two models and decide on their similarity, using hypothesis tests for the standard deviation of their distribution. The key utility of this technique lays in its capacity to identify the exact node where anomalies occur, and to fire a focused alarm pointing to erroneous data.