SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
The pyramid-technique: towards breaking the curse of dimensionality
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient and tumble similar set retrieval
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees
IEEE Transactions on Knowledge and Data Engineering
Indexing the Distance: An Efficient Method to KNN Processing
Proceedings of the 27th International Conference on Very Large Data Bases
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Compressing historical information in sensor networks
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
LSH forest: self-tuning indexes for similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
BATON: a balanced tree structure for peer-to-peer networks
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Distributed fault detection of wireless sensor networks
DIWANS '06 Proceedings of the 2006 workshop on Dependability issues in wireless ad hoc networks and sensor networks
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Using sensorranks for in-network detection of faulty readings in wireless sensor networks
MobiDE '07 Proceedings of the 6th ACM international workshop on Data engineering for wireless and mobile access
Robust management of outliers in sensor network aggregate queries
MobiDE '07 Proceedings of the 6th ACM international workshop on Data engineering for wireless and mobile access
A topology-aware hierarchical structured overlay network based on locality sensitive hashing scheme
Proceedings of the second workshop on Use of P2P, GRID and agents for the development of content networks
BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bounded LSH for Similarity Search in Peer-to-Peer File Systems
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Modeling LSH for performance tuning
Proceedings of the 17th ACM conference on Information and knowledge management
Multi-query optimization for sketch-based estimation
Information Systems
Distributed similarity search in high dimensions using locality sensitive hashing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Hierarchically compressed wavelet synopses
The VLDB Journal — The International Journal on Very Large Data Bases
Nearest Neighbor Retrieval Using Distance-Based Hashing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Another Outlier Bites the Dust: Computing Meaningful Aggregates in Sensor Networks
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Quality and efficiency in high dimensional nearest neighbor search
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Answering similarity queries in peer-to-peer networks
Information Systems
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space
ACM Transactions on Database Systems (TODS)
TACO: tunable approximate computation of outliers in wireless sensor networks
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
PAO: power-efficient attribution of outliers in wireless sensor networks
Proceedings of the Seventh International Workshop on Data Management for Sensor Networks
Declarative support for sensor data cleaning
PERVASIVE'06 Proceedings of the 4th international conference on Pervasive Computing
Peer-to-peer similarity search based on m-tree indexing
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Towards enabling outlier detection in large, high dimensional data warehouses
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
Computing the similarity between data objects is a fundamental operation for many distributed applications such as those on the World Wide Web, in Peer-to-Peer networks, or even in Sensor Networks. In our work, we provide a framework based on Random Hyperplane Projection (RHP) that permits continuous computation of similarity estimates (using the cosine similarity or the correlation coefficient as the preferred similarity metric) between data descriptions that are streamed from remote sites. These estimates are computed at a monitoring node, without the need for transmitting the actual data values. The original RHP framework is data agnostic and works for arbitrary data sets. However, data in most applications is not uniform. In our work, we first describe the shortcomings of the RHP scheme, in particular, its inefficiency to exploit evident skew in the underlying data distribution and then propose a novel framework that automatically detects correlations and computes an RHP embedding in the Hamming cube tailored to the provided data set using the idea of derived dimensions we first introduce. We further discuss extensions of our framework in order to cope with changes in the data distribution. In such cases, our technique automatically reverts to the basic RHP model for data items that cannot be described accurately through the computed embedding. Our experimental evaluation using several real and synthetic data sets demonstrates that our proposed scheme outperforms the existing RHP algorithm and alternative techniques that have been proposed, providing significantly more accurate similarity computations using the same number of bits.