Determining the saliency of feature measurements obtained from images of sedimentary organic matter for use in its classification

Authors:
Andrew F. Weller;Anthony J. Harris;J. Andrew Ware;Paul S. Jarvis
Affiliations:
Department of Earth Sciences, Geological Institute, ETH Zürich CH-8092, Zürich, Switzerland;School of Applied Sciences, University of Glamorgan, Pontypridd CF37 1DL, UK;School of Computing, University of Glamorgan, Pontypridd CF37 1DL, UK;School of Computing, University of Glamorgan, Pontypridd CF37 1DL, UK
Venue:
Computers & Geosciences
Year:
2006

Citing 3
Cited 2

The semi-automated classification of sedimentary organic matter in palynological preparations

Computers & Geosciences
A Problem of Dimensionality: A Simple Example

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the mean accuracy of statistical pattern recognizers

IEEE Transactions on Information Theory

Accuracy and efficiency comparisons of single- and multi-cycled software classification models

Information and Software Technology
A hybrid neural networks-fuzzy logic-genetic algorithm for grade estimation

Computers & Geosciences

Quantified Score

Hi-index	0.01

Visualization

Abstract

The classification of sedimentary organic matter (OM) images can be improved by determining the saliency of image analysis (IA) features measured from them. Knowing the saliency of IA feature measurements means that only the most significant discriminating features need be used in the classification process. This is an important consideration for classification techniques such as artificial neural networks (ANNs), where too many features can lead to the 'curse of dimensionality'. The classification scheme adopted in this work is a hybrid of morphologically and texturally descriptive features from previous manual classification schemes. Some of these descriptive features are assigned to IA features, along with several others built into the IA software (Halcon) to ensure that a valid cross-section is available. After an image is captured and segmented, a total of 194 features are measured for each particle. To reduce this number to a more manageable magnitude, the SPSS AnswerTree Exhaustive CHAID (@g^2 automatic interaction detector) classification tree algorithm is used to establish each measurement's saliency as a classification discriminator. In the case of continuous data as used here, the F-test is used as opposed to the published algorithm. The F-test checks various statistical hypotheses about the variance of groups of IA feature measurements obtained from the particles to be classified. The aim is to reduce the number of features required to perform the classification without reducing its accuracy. In the best-case scenario, 194 inputs are reduced to 8, with a subsequent multi-layer back-propagation ANN recognition rate of 98.65%. This paper demonstrates the ability of the algorithm to reduce noise, help overcome the curse of dimensionality, and facilitate an understanding of the saliency of IA features as discriminators for sedimentary OM classification.