A sparse nearest mean classifier for high dimensional multi-class problems

  • Authors:
  • Cor J. Veenman;Annabel Bolck

  • Affiliations:
  • Digital Technology and Biometrics Department, Netherlands Forensic Institute, The Hague, The Netherlands;Digital Technology and Biometrics Department, Netherlands Forensic Institute, The Hague, The Netherlands

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2011

Quantified Score

Hi-index 0.10

Visualization

Abstract

The analysis of small datasets in high dimensional spaces is inherently difficult. For two-class classification problems there are a few methods that are able to face the so-called curse of dimensionality. However, for multi-class sparsely sampled datasets there are hardly any specific methods. In this paper, we propose four multi-class classifier alternatives that effectively deal with this type of data. Moreover, these methods implicitly select a feature subset optimized for class separation. Accordingly, they are especially interesting for domains where an explanation of the problem in terms of the original features is desired. In the experiments, we applied the proposed methods to an MDMA powders dataset, where the problem was to recognize the production process. It turns out that the proposed multi-class classifiers perform well, while the few utilized features correspond to known MDMA synthesis ingredients. In addition, to show the general applicability of the methods, we applied them to several other sparse datasets, ranging from bioinformatics to chemometrics datasets having as few as tens of samples in tens to even thousands of dimensions and three to four classes. The proposed methods had the best average performance, while very few dimensions were effectively utilized.