Efficient spatial sampling of large geographical tables

Authors:
Anish Das Sarma;Hongrae Lee;Hector Gonzalez;Jayant Madhavan;Alon Halevy
Affiliations:
Google, Mountain View, CA, USA;Google, Mountain View, CA, USA;Google, Mountain View, CA, USA;Google, Mountain View, CA, USA;Google, Mountain View, CA, USA
Venue:
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Year:
2012

Citing 11
Cited 5

The design and analysis of spatial data structures

The design and analysis of spatial data structures
Building a scaleable geo-spatial DBMS: technology, implementation, and evaluation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The DEDALE system for complex spatial queries

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Incremental distance join algorithms for spatial databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Towards a Formal Model for Multi-Resolution Spatial Maps

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Constant information density in zoomable interfaces

AVI '98 Proceedings of the working conference on Advanced visual interfaces
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Google fusion tables: web-centered data management and collaboration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Multiresolution select-distinct queries on large geographic point sets

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
An efficient layout method for a large collection of geographic data entries

Proceedings of the 16th International Conference on Extending Database Technology
Mobility and social networking: a data management perspective

Proceedings of the VLDB Endowment
Consistent thinning of large geographical data for map visualization

ACM Transactions on Database Systems (TODS) - Invited papers issue
imMens: real-time visual querying of big data

EuroVis '13 Proceedings of the 15th Eurographics Conference on Visualization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale map visualization systems play an increasingly important role in presenting geographic datasets to end users. Since these datasets can be extremely large, a map rendering system often needs to select a small fraction of the data to visualize them in a limited space. This paper addresses the fundamental challenge of thinning: determining appropriate samples of data to be shown on specific geographical regions and zoom levels. Other than the sheer scale of the data, the thinning problem is challenging because of a number of other reasons: (1) data can consist of complex geographical shapes, (2) rendering of data needs to satisfy certain constraints, such as data being preserved across zoom levels and adjacent regions, and (3) after satisfying the constraints, an optimal solution needs to be chosen based on objectives such as maximality, fairness, and importance of data. This paper formally defines and presents a complete solution to the thinning problem. First, we express the problem as a integer programming formulation that efficiently solves thinning for desired objectives. Second, we present more efficient solutions for maximality, based on DFS traversal of a spatial tree. Third, we consider the common special case of point datasets, and present an even more efficient randomized algorithm. Finally, we have implemented all techniques from this paper in Google Maps visualizations of Fusion Tables, and we describe a set of experiments that demonstrate the tradeoffs among the algorithms.