Efficiently managing large-scale raster species distribution data in PostgreSQL

Authors:
Jianting Zhang;Michael Gertz;Le Gruenwald
Affiliations:
City College of New York, New York, NY;University of Heidelberg, Heidelberg, Germany;University of Oklahoma, Norman, OK
Venue:
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Year:
2009

Citing 14
Cited 0

Multidimensional access methods

ACM Computing Surveys (CSUR)
The Quadtree and Related Hierarchical Data Structures

ACM Computing Surveys (CSUR)
An effective way to represent quadtrees

Communications of the ACM
Quadtree and R-tree indexes in oracle spatial: a comparison using GIS data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Window Query Processing in Linear Quadtrees

Distributed and Parallel Databases
Efficient Window Block Retrieval in Quadtree-Based Spatial Databases

Geoinformatica
SP-GiST: An Extensible Database Index for Supporting Space Partitioning Trees

Journal of Intelligent Information Systems
A Strip-Splitting-Based Optimal Algorithm for Decomposing a Query Window into Maximal Quadtree Blocks

IEEE Transactions on Knowledge and Data Engineering
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
R-Trees: Theory and Applications (Advanced Information and Knowledge Processing)

R-Trees: Theory and Applications (Advanced Information and Knowledge Processing)
Space-Partitioning Trees in PostgreSQL: Realization and Performance

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Pro Oracle Spatial for Oracle Database 11g (Expert's Voice in Oracle)

Pro Oracle Spatial for Oracle Database 11g (Expert's Voice in Oracle)
Spatial indexing in microsoft SQL server 2008

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Embedding and extending GIS for exploratory analysis of large-scale species distribution data

Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Species distribution data play an important role in biodiversity related research, especially in exploring relationships with the environment. In the recent years, both the number of species being explored and the spatial resolution of species distribution data are increasing fast. It is thus imperative to develop database systems that allow users to efficiently query such large-scale data based on spatial and non-spatial (e.g., taxonomic and phylogenetics) criteria. In this paper, we present our approach to building such a system by integrating several components, including a quadtree representation of binary raster data, tree path indexing and query processing in PostgreSQL, and window decomposition techniques for spatial queries. Our unique contribution is in associating species identifiers with intermediate quadtree nodes and query optimization for multiple independent queries after window query decomposition. Our system enables PostgreSQL to support binary raster data without requiring any changes to the database backend and is suitable for managing large-scale species distribution data. Our experiments using 4000+ bird species distribution data related to the Western hemisphere show that the proposed approach in associating species identifiers with quadtree nodes reduces the number of database tuples by more than 1/3 and the average identifiers to be associated with each tuple from 110.6 to 4.8, a significant improvement compared to classic quadtree-based approaches. With respect to query optimization, optimized queries are 6--9.5 times faster than the baseline queries for average query response times and 5.5--8.3 times faster than the baseline queries for maximum query response times for four query window sizes ranging from 0.1 to 5.0 degrees. Our query optimization techniques thus make the system suitable for many interactive applications for querying and exploring species distribution data.