Online data visualization of multidimensional databases using the hilbert space-filling curve

  • Authors:
  • Jose Castro;Steven Burns

  • Affiliations:
  • Costa Rica Institute of Technology;Costa Rica Institute of Technology

  • Venue:
  • VIEW'06 Proceedings of the 1st first visual information expert conference on Pixelization paradigm
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose in this paper a visualization approach for large online databases using the Hilbert space-filling curve to map N-dimensional data points to 2D or 3D points. Dimensionality reduction methods like principal component analysis (PCA), multi dimensional scaling (MDS) or self organizing maps (SOMS) can map N-dimensional data points with N≫3 into 3 dimensional or 2 dimensional values that allow us to visualize the data. These methods although popular, require either the calculation of a scatter matrix, eigenvalues and eigenvectors, or the iteration of learning algorithms. Therefore these methods cannot perform online, can be slow with large databases and always produce information loss when the data is mapped from the multidimensional space to the 2D or 3D image. Space-filling curves like the Peano, Z, and Hilbert curve, on the contrary, produce a 1-to-1 mapping between points in a line segment and an arbitrary N-Dimensional hypercube. This 1-to-1 mapping guarantees that there is no information loss on the transformation. Specifically the Hilbert space-filling curve is known to preserve the Lebesgue measure and has been proven to produce an optimal mapping in the sense that an arbitrary contiguous block of information will receive the minimum number of splits in the mapped space. The Hilbert space-filling curve has been extensively used for indexing and clustering by mapping N-dimensional data points to 1-dimensional values. We propose here to use the curve to map to 2 or 3 dimensions for purposes of visualization: By taking advantage of its 1-to-1 nature, a new and generic method to map N-dimensional data points to 2D or 3D points using the Hilbert space-filling curve is developed. We prove theoretically that the calculation of the mapping can be done in constant time if we fix the order of approximation, thereby giving linear O(n) performance on the number of data points to map. We create a Hilbert space-filling curve visualization tool that is much faster than the other methods mentioned and allows us to generate quickly for very large datasets various different visualizations of the data, thereby compensating the lack of use of statistical information in the calculation of the mapped points. We compare our approach to MDS and PCA with a benchmark data set and three real datasets using the distance preserving and topology preserving measure as benchmarks. Our experiments indicate that the Hilbert space-filling curve produces acceptable quality of mapping while achieving much faster visualization and is therefore especially useful for online visualization of very large data sets.