A local semi-supervised Sammon algorithm for textual data visualization

Authors:
Manuel Martín-Merino;Ángela Blanco
Affiliations:
Universidad Pontificia de Salamanca, Salamanca, Spain 37002;Universidad Pontificia de Salamanca, Salamanca, Spain 37002
Venue:
Journal of Intelligent Information Systems
Year:
2009

Citing 15
Cited 1

Non-linear dimensionality reduction techniques for unsupervised feature extraction

Pattern Recognition Letters
A corpus-based approach to comparative evaluation of statistical term association measures

Journal of the American Society for Information Science and Technology
Modern Information Retrieval

Modern Information Retrieval
Self-Organizing Maps

Self-Organizing Maps
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Redefining Clustering for High-Dimensional Applications

IEEE Transactions on Knowledge and Data Engineering
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
On Using Partial Supervision for Text Categorization

IEEE Transactions on Knowledge and Data Engineering
A New Sammon Algorithm for Sparse Data Visualization

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
A Nonlinear Mapping for Data Structure Analysis

IEEE Transactions on Computers
Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets

IEEE Transactions on Neural Networks
Self organization of a massive document collection

IEEE Transactions on Neural Networks
Learning from labeled and unlabeled data using a minimal number of queries

IEEE Transactions on Neural Networks
Artificial neural networks for feature extraction and multivariate data projection

IEEE Transactions on Neural Networks
A nonlinear projection method based on Kohonen's topology preserving maps

IEEE Transactions on Neural Networks

Learning a combination of heterogeneous dissimilarities from incomplete knowledge

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sammon's mapping is a powerful non-linear technique that allow us to visualize high dimensional object relationships. It has been applied to a broad range of practical problems and particularly to the visualization of the semantic relations among terms in textual databases. The word maps generated by the Sammon mapping suffer from a low discriminant power due to the well known "curse of dimensionality" and to the unsupervised nature of the algorithm. Fortunately the textual databases provide frequently a manually created classification for a subset of documents that may help to overcome this problem. In this paper we first introduce a modification of the Sammon mapping (SSammon) that enhances the local topology reducing the sensibility to the 'curse of dimensionality'. Next a semi-supervised version is proposed that takes advantage of the a priori categorization of a subset of documents to improve the discriminant power of the word maps generated. The new algorithm has been applied to the challenging problem of word map generation. The experimental results suggest that the new model improves significantly well known unsupervised alternatives.