Using Data Compression for Multidimensional Distribution Analysis

  • Authors:
  • Kentaro Onizuka;Tamotsu Noguchi;Yutaka Akiyama;Hideo Matsuda

  • Affiliations:
  • -;-;-;-

  • Venue:
  • IEEE Intelligent Systems
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The authors propose a method for multidimensional distribution analysis using a data compression technique. The method avoids the explosion in number of parameters (or coefficients) representing a multidimensional distribution even when the distribution has many dimensions (up to six dimensions or more). In the method, a multidimensional distribution is linearly expanded into a set of expansion coefficients. The expansion procedure neglects high-order cross-terms and reduces the total number of coefficients representing the distribution. This compression technique resemble DCT-based image data compression for computer vision.The authors applied the method to the knowledge-based mean-force potentials between residues for the analysis of protein sequence structure compatibility. They obtain the mean-force potentials by the multidimensional distribution of relative configurations (essentially 6D) between residues. The performance of the multidimensional mean-force potentials measured by native-structure-recognition tests was proved much higher than the performance of conventional 1D distance-based potentials derived from binned distributions.