Random hyperplane projection using derived dimensions

Authors:
Konstantinos Georgoulas;Yannis Kotidis
Affiliations:
Athens University of Economics and Business, Athens, Greece;Athens University of Economics and Business, Athens, Greece
Venue:
Proceedings of the Ninth ACM International Workshop on Data Engineering for Wireless and Mobile Access
Year:
2010

Citing 12
Cited 2

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

Journal of the ACM (JACM)
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient and tumble similar set retrieval

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Compressing historical information in sensor networks

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Distributed fault detection of wireless sensor networks

DIWANS '06 Proceedings of the 2006 workshop on Dependability issues in wireless ad hoc networks and sensor networks
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Using sensorranks for in-network detection of faulty readings in wireless sensor networks

MobiDE '07 Proceedings of the 6th ACM international workshop on Data engineering for wireless and mobile access
A topology-aware hierarchical structured overlay network based on locality sensitive hashing scheme

Proceedings of the second workshop on Use of P2P, GRID and agents for the development of content networks
Another Outlier Bites the Dust: Computing Meaningful Aggregates in Sensor Networks

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
TACO: tunable approximate computation of outliers in wireless sensor networks

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Declarative support for sensor data cleaning

PERVASIVE'06 Proceedings of the 4th international conference on Pervasive Computing

PAO: power-efficient attribution of outliers in wireless sensor networks

Proceedings of the Seventh International Workshop on Data Management for Sensor Networks
In-network approximate computation of outliers with quality guarantees

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computing the similarity between data objects is a fundamental operation for many distributive applications such as those on the Word Wide Wed, in Peer-to-Peer networks or even in Sensor Networks. Locality Sensitive Hashing (LSH) has been recently proposed in order to reduce the number of bits that need to be transmitted between sites in order to permit evaluation of different similarity functions between the data objects. In our work we investigate a particular form of LSH, termed Random Hyperplane Projection (RHP). RHP is a data agnostic model that works for arbitrary data sets. However, data in most applications is not uniform. In our work, we first describe the shortcomings of the RHP scheme, in particular, its inefficiency to exploit evident skew in the underlying data distribution and then propose a novel framework that automatically detects correlations and computes an RHP embedding in the Hamming cube tailored to the provided data set. We further discuss extensions of our framework in order to cope with changes in the data distribution or outliers. In such cases our technique automatically reverts to the basic RHP model for data items that can not be described accurately through the computed embedding. Our experimental evaluation using several real datasets demonstrates that our proposed scheme outperforms the existing RHP algorithm providing up to three times more accurate similarity computations using the same number of bits.