Multiscale histograms: summarizing topological relations in large spatial datasets

Authors:
Xuemin Lin;Qing Liu;Yidong Yuan;Xiaofang Zhou
Affiliations:
University of NSW, Sydney, Australia;University of NSW, Sydney, Australia;University of NSW, Sydney, Australia;University of Queensland, Brisbane, Australia
Venue:
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Year:
2003

Citing 15
Cited 6

Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Range queries in OLAP data cubes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximation of k-set cover by semi-local optimization

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Histogram-based estimation techniques in database systems

Histogram-based estimation techniques in database systems
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
Selectivity estimation in spatial databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Selectivity Estimation for Spatial Joins with Geometric Selections

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
The Geometry of Browsing

LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
Accurate Estimation of the Cost of Spatial Selections

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Analyzing Range Queries on Spatial Data

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Exploring Spatial Datasets with Histograms

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
How to summarize the universe: dynamic maintenance of quantiles

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Topological inference

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1

The power-method: a comprehensive estimation technique for multi-dimensional queries

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Summarizing level-two topological relations in large spatial datasets

ACM Transactions on Database Systems (TODS)
Browsing large online data tables using generalized query previews

Information Systems
Compressed hierarchical binary histograms for summarizing multi-dimensional data

Knowledge and Information Systems
A quad-tree based multiresolution approach for two-dimensional summary data

Information Systems
Summarizing spatial relations – a hybrid histogram

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Summarizing topological relations is fundamental to many spatial applications including spatial query optimization. In this paper, we present several novel techniques to effectively construct cell density based spatial histograms for range (window) summarizations restricted to the four most important topological relations: contains, contained, overlap, and disjoint. We first present a novel framework to construct a multiscale histogram composed of multiple Euler histograms with the guarantee of the exact summarization results for aligned windows in constant time. Then we present an approximate algorithm, with the approximate ratio 19/12, to minimize the storage spaces of such multiscale Euler histograms, although the problem is generally NP-hard. To conform to a limited storage space where only k Euler histograms are allowed, an effective algorithm is presented to construct multiscale histograms to achieve high accuracy. Finally, we present a new approximate algorithm to query an Euler histogram that cannot guarantee the exact answers; it runs in constant time. Our extensive experiments against both synthetic and real world datasets demonstrated that the approximate multiscale histogram techniques may improve the accuracy of the existing techniques by several orders of magnitude while retaining the cost efficiency, and the exact multiscale histogram technique requires only a storage space linearly proportional to the number of cells for the real datasets.