Determining the saliency of feature measurements obtained from images of sedimentary organic matter for use in its classification

  • Authors:
  • Andrew F. Weller;Anthony J. Harris;J. Andrew Ware;Paul S. Jarvis

  • Affiliations:
  • Department of Earth Sciences, Geological Institute, ETH Zürich CH-8092, Zürich, Switzerland;School of Applied Sciences, University of Glamorgan, Pontypridd CF37 1DL, UK;School of Computing, University of Glamorgan, Pontypridd CF37 1DL, UK;School of Computing, University of Glamorgan, Pontypridd CF37 1DL, UK

  • Venue:
  • Computers & Geosciences
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

The classification of sedimentary organic matter (OM) images can be improved by determining the saliency of image analysis (IA) features measured from them. Knowing the saliency of IA feature measurements means that only the most significant discriminating features need be used in the classification process. This is an important consideration for classification techniques such as artificial neural networks (ANNs), where too many features can lead to the 'curse of dimensionality'. The classification scheme adopted in this work is a hybrid of morphologically and texturally descriptive features from previous manual classification schemes. Some of these descriptive features are assigned to IA features, along with several others built into the IA software (Halcon) to ensure that a valid cross-section is available. After an image is captured and segmented, a total of 194 features are measured for each particle. To reduce this number to a more manageable magnitude, the SPSS AnswerTree Exhaustive CHAID (@g^2 automatic interaction detector) classification tree algorithm is used to establish each measurement's saliency as a classification discriminator. In the case of continuous data as used here, the F-test is used as opposed to the published algorithm. The F-test checks various statistical hypotheses about the variance of groups of IA feature measurements obtained from the particles to be classified. The aim is to reduce the number of features required to perform the classification without reducing its accuracy. In the best-case scenario, 194 inputs are reduced to 8, with a subsequent multi-layer back-propagation ANN recognition rate of 98.65%. This paper demonstrates the ability of the algorithm to reduce noise, help overcome the curse of dimensionality, and facilitate an understanding of the saliency of IA features as discriminators for sedimentary OM classification.