Large Datasets Visualization with Neural Network Using Clustered Training Data

  • Authors:
  • Sergėjus Ivanikovas;Gintautas Dzemyda;Viktor Medvedev

  • Affiliations:
  • Institute of Mathematics and Informatics, , Vilnius, Lithuania LT-08663 and Vilnius Pedagogical University, Vilnius, Lithuania LT-08106;Institute of Mathematics and Informatics, , Vilnius, Lithuania LT-08663 and Vilnius Pedagogical University, Vilnius, Lithuania LT-08106;Institute of Mathematics and Informatics, , Vilnius, Lithuania LT-08663 and Vilnius Pedagogical University, Vilnius, Lithuania LT-08106

  • Venue:
  • ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the visualization of large datasets with SAMANN algorithm using clustering methods for initial dataset reduction for the network training. The visualization of multidimensional data is highly important in data mining because recent applications produce large amount of data that need specific means for the knowledge discovery. One of the ways to visualize multidimensional dataset is to project it onto a plane. This paper analyzes the visualization of multidimensional data using feed-forward neural network. We investigate an unsupervised backpropagation algorithm to train a multilayer feed-forward neural network (SAMANN) to perform the Sammon`s nonlinear projection. The SAMANN network offers the generalization ability of projecting new data. Previous investigations showed that it is possible to train SAMANN using only a part of analyzed dataset without the loss of accuracy. It is very important to select proper vector subset for the neural network training. One of the ways to construct relevant training subset is to use clustering. This allows to speed up the visualization of large datasets.