Cohort-based kernel visualisation with scatter matrices

Authors:
Enrique Romero;Tingting Mu;Paulo J. G. Lisboa
Affiliations:
Departament de Llenguatges i Sistemes Informítics, Universitat Politècnica de Catalunya, Spain;School of Computing, Informatics and Media, University of Bradford, UK;School of Computing and Mathematical Sciences, Liverpool John Moores University, UK
Venue:
Pattern Recognition
Year:
2012

Citing 11
Cited 0

GTM: the generative topographic mapping

Neural Computation
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Relationship-Based Clustering and Visualization for High-Dimensional Data Mining

INFORMS Journal on Computing
Hierarchical Gaussian process latent variable models

Proceedings of the 24th international conference on Machine learning
A Nonlinear Feature Extraction Algorithm Using Distance Transformation

IEEE Transactions on Computers
A Nonlinear Mapping for Data Structure Analysis

IEEE Transactions on Computers
Cluster-based visualisation with scatter matrices

Pattern Recognition Letters
Kernel maximum scatter difference based feature extraction and its application to face recognition

Pattern Recognition Letters
Kernel Discriminant Analysis for Positive Definite and Indefinite Kernels

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction
Artificial neural networks for feature extraction and multivariate data projection

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

Visualisation with good discrimination between data cohorts is important for exploratory data analysis and for decision support interfaces. This paper proposes a kernel extension of the cluster-based linear visualisation method described in Lisboa et al. [15]. A representation of the data in dual form permits the application of the kernel trick, so projecting the data onto the orthonormalised cohort means in the feature space. The only parameters of the method are those for the kernel function. The method is shown to obtain well-discriminating visualisations of non-linearly separable data with low computational cost. The linearity of the visualisation was tested using nearest neighbour and linear discriminant classifiers, achieving significant improvements in classification accuracy with respect to the original features, especially for high-dimensional data, where 93% accuracy was obtained for the Splice-junction Gene Sequences data set from the UCI repository.