Random hyperplane projection using derived dimensions

  • Authors:
  • Konstantinos Georgoulas;Yannis Kotidis

  • Affiliations:
  • Athens University of Economics and Business, Athens, Greece;Athens University of Economics and Business, Athens, Greece

  • Venue:
  • Proceedings of the Ninth ACM International Workshop on Data Engineering for Wireless and Mobile Access
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computing the similarity between data objects is a fundamental operation for many distributive applications such as those on the Word Wide Wed, in Peer-to-Peer networks or even in Sensor Networks. Locality Sensitive Hashing (LSH) has been recently proposed in order to reduce the number of bits that need to be transmitted between sites in order to permit evaluation of different similarity functions between the data objects. In our work we investigate a particular form of LSH, termed Random Hyperplane Projection (RHP). RHP is a data agnostic model that works for arbitrary data sets. However, data in most applications is not uniform. In our work, we first describe the shortcomings of the RHP scheme, in particular, its inefficiency to exploit evident skew in the underlying data distribution and then propose a novel framework that automatically detects correlations and computes an RHP embedding in the Hamming cube tailored to the provided data set. We further discuss extensions of our framework in order to cope with changes in the data distribution or outliers. In such cases our technique automatically reverts to the basic RHP model for data items that can not be described accurately through the computed embedding. Our experimental evaluation using several real datasets demonstrates that our proposed scheme outperforms the existing RHP algorithm providing up to three times more accurate similarity computations using the same number of bits.