A new approach to discover interlacing data structures in high-dimensional space

  • Authors:
  • Tao Ban;Changshui Zhang;Shigeo Abe

  • Affiliations:
  • Information Security Research Center, National Institute of Information and Communications Technology, Tokyo, Japan 184-8795;Department of Automation, Tsinghua University, Beijing, China 100-083;Graduate School of Science and Technology, Kobe University, Kobe, Japan 657-8501

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The discovery of structures hidden in high-dimensional data space is of great significance for understanding and further processing of the data. Real world datasets are often composed of multiple low dimensional patterns, the interlacement of which may impede our ability to understand the distribution rule of the data. Few of the existing methods focus on the detection and extraction of the manifolds representing distinct patterns. Inspired by the nonlinear dimensionality reduction method ISOmap, in this paper we present a novel approach called Multi-Manifold Partition to identify the interlacing low dimensional patterns. The algorithm has three steps: first a neighborhood graph is built to capture the intrinsic topological structure of the input data, then the dimensional uniformity of neighboring nodes is analyzed to discover the segments of patterns, finally the segments which are possibly from the same low-dimensional structure are combined to obtain a global representation of distribution rules. Experiments on synthetic data as well as real problems are reported. The results show that this new approach to exploratory data analysis is effective and may enhance our understanding of the data distribution.