Cubetree: organization of and bulk incremental updates on the data cube
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
Online outlier detection in sensor data using non-parametric models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Distributed similarity estimation using derived dimensions
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
In this work we present a novel framework that permits us to detect outliers in a data warehouse. We extend the commonly used definition of distance-based outliers in order to cope with the large data domains that are typical in dimensional modeling of OLAP datasets. Our techniques utilize a two-level indexing scheme. The first level is based on Locality Sensitivity Hashing (LSH) and allows us to replace range searching, which is very inefficient in high dimensional spaces, with approximate nearest neighbor computations in an intuitive manner. The second level utilizes the Piece-wise Aggregate Approximation (PAA) technique, which substantially reduces the space required for storing the data representations. As will be explained, our method permits incremental updates on the data representation used, which is essential for managing voluminous datasets common in data warehousing applications.