Self-organization and associative memory: 3rd edition
Self-organization and associative memory: 3rd edition
Subsymbolic natural language processing: an integrated model of scripts, lexicon, and memory
Subsymbolic natural language processing: an integrated model of scripts, lexicon, and memory
GTM: the generative topographic mapping
Neural Computation
A Combined Latent Class and Trait Model for the Analysis and Visualization of Discrete Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hierarchical GTM: Constructing Localized Nonlinear Projection Manifolds in a Principled Way
IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
On Clustering Validation Techniques
Journal of Intelligent Information Systems
Very Large Two-Level SOM for the Browsing of Newsgroups
ICANN 96 Proceedings of the 1996 International Conference on Artificial Neural Networks
The Journal of Machine Learning Research
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
A Nonlinear Mapping for Data Structure Analysis
IEEE Transactions on Computers
Two-way Poisson mixture models for simultaneous document classification and word clustering
Computational Statistics & Data Analysis
IEEE Transactions on Neural Networks
Artificial neural networks for feature extraction and multivariate data projection
IEEE Transactions on Neural Networks
Probabilistic self-organizing maps for qualitative data
Neural Networks
Probabilistic self-organizing maps for continuous data
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Data visualization is aimed at obtaining a graphic representation of high dimensional information. A data projection over a lower dimensional space is pursued, looking for some structure on the projections. Among the several data projection based methods available, the Generative Topographic Mapping (GTM) has become an important probabilistic framework to model data. The application to document data requires a change in the original (Gaussian) model in order to consider binary or multinomial variables. There have been several modifications on GTM to consider this kind of data, but the resulting latent projections are all scattered on the visualization plane. A document visualization method is proposed in this paper, based on a generative probabilistic model consisting of a mixture of Zero-inflated Poisson distributions. The performance of the method is evaluated in terms of cluster forming for the latent projections with an index based on Fisher's classifier, and the topology preservation capability is measured with the Sammon's stress error. A comparison with the GTM implementation with Gaussian, multinomial and Poisson distributions and with a Latent Dirichlet model is presented, observing a greater performance for the proposed method. A graphic presentation of the projections is also provided, showing the advantage of the developed method in terms of visualization and class separation. A detailed analysis of some documents projected on the latent representation showed that most of the documents appearing away from the corresponding cluster could be identified as outliers.