Performance analysis of a dual-tree algorithm for computing spatial distance histograms

Authors:
Shaoping Chen;Yi-Cheng Tu;Yuni Xia
Affiliations:
Department of Mathematics, Wuhan University of Technology, Wuhan, People's Republic of China 430070;Department of Computer Science and Engineering, The University of South Florida, Tampa, USA 33620;Computer and Information Science Department, Indiana University-Purdue University Indianapolis, Indianapolis, USA 46202
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2011

Citing 20
Cited 2

Computer simulation of liquids

Computer simulation of liquids
A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields

Journal of the ACM (JACM)
A fast algorithm for particle simulations

Journal of Computational Physics - Special issue: commenoration of the 30th anniversary
Large scale distributed data repository: design of a molecular dynamics trajectory database

Future Generation Computer Systems
The Quadtree and Related Hierarchical Data Structures

ACM Computing Surveys (CSUR)
Understanding Molecular Simulation

Understanding Molecular Simulation
The SDSS skyserver: public access to the sloan digital sky server data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Introduction to Algorithms

Introduction to Algorithms
QBISM: Extending a DBMS to Support 3D Medical Images

Proceedings of the Tenth International Conference on Data Engineering
Analysis of predictive spatio-temporal queries

ACM Transactions on Database Systems (TODS)
GODIVA: Lightweight Data Management for Scientific Visualization Applications

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Applications for Expression Data in Relational Database Systems

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Scientific data management in the coming decade

ACM SIGMOD Record
The Center for Plasma Edge Simulation Workflow Requirements

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Astronomical Image and Data Analysis (Astronomy and Astrophysics Library)

Astronomical Image and Data Analysis (Astronomy and Astrophysics Library)
The end of an architectural era: (it's time for a complete rewrite)

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Computing Distance Histograms Ef?ciently in Scientific Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
B-Fabric: An Open Source Life Sciences Data Management System

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Overview of sciDB: large scale array storage, processing and analysis

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Distance histogram computation based on spatiotemporal uniformity in scientific data

Proceedings of the 15th International Conference on Extending Database Technology
Efficient SDH computation in molecular simulations data

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many scientific and engineering fields produce large volume of spatiotemporal data. The storage, retrieval, and analysis of such data impose great challenges to database systems design. Analysis of scientific spatiotemporal data often involves computing functions of all point-to-point interactions. One such analytics, the Spatial Distance Histogram (SDH), is of vital importance to scientific discovery. Recently, algorithms for efficient SDH processing in large-scale scientific databases have been proposed. These algorithms adopt a recursive tree-traversing strategy to process point-to-point distances in the visited tree nodes in batches, thus require less time when compared to the brute-force approach where all pairwise distances have to be computed. Despite the promising experimental results, the complexity of such algorithms has not been thoroughly studied. In this paper, we present an analysis of such algorithms based on a geometric modeling approach. The main technique is to transform the analysis of point counts into a problem of quantifying the area of regions where pairwise distances can be processed in batches by the algorithm. From the analysis, we conclude that the number of pairwise distances that are left to be processed decreases exponentially with more levels of the tree visited. This leads to the proof of a time complexity lower than the quadratic time needed for a brute-force algorithm and builds the foundation for a constant-time approximate algorithm. Our model is also general in that it works for a wide range of point spatial distributions, histogram types, and space-partitioning options in building the tree.