The semi-automated classification of sedimentary organic matter in palynological preparations
Computers & Geosciences
A Problem of Dimensionality: A Simple Example
IEEE Transactions on Pattern Analysis and Machine Intelligence
On the mean accuracy of statistical pattern recognizers
IEEE Transactions on Information Theory
Accuracy and efficiency comparisons of single- and multi-cycled software classification models
Information and Software Technology
A hybrid neural networks-fuzzy logic-genetic algorithm for grade estimation
Computers & Geosciences
Hi-index | 0.01 |
The classification of sedimentary organic matter (OM) images can be improved by determining the saliency of image analysis (IA) features measured from them. Knowing the saliency of IA feature measurements means that only the most significant discriminating features need be used in the classification process. This is an important consideration for classification techniques such as artificial neural networks (ANNs), where too many features can lead to the 'curse of dimensionality'. The classification scheme adopted in this work is a hybrid of morphologically and texturally descriptive features from previous manual classification schemes. Some of these descriptive features are assigned to IA features, along with several others built into the IA software (Halcon) to ensure that a valid cross-section is available. After an image is captured and segmented, a total of 194 features are measured for each particle. To reduce this number to a more manageable magnitude, the SPSS AnswerTree Exhaustive CHAID (@g^2 automatic interaction detector) classification tree algorithm is used to establish each measurement's saliency as a classification discriminator. In the case of continuous data as used here, the F-test is used as opposed to the published algorithm. The F-test checks various statistical hypotheses about the variance of groups of IA feature measurements obtained from the particles to be classified. The aim is to reduce the number of features required to perform the classification without reducing its accuracy. In the best-case scenario, 194 inputs are reduced to 8, with a subsequent multi-layer back-propagation ANN recognition rate of 98.65%. This paper demonstrates the ability of the algorithm to reduce noise, help overcome the curse of dimensionality, and facilitate an understanding of the saliency of IA features as discriminators for sedimentary OM classification.