A Probabilistic Analysis of Trie-Based Sorting of Large Collections of Line Segments in Spatial Databases

  • Authors:
  • Michael Lindenbaum;Hanan Samet;Gisli R. Hjaltason

  • Affiliations:
  • -;-;-

  • Venue:
  • SIAM Journal on Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The size of five trie-based methods of sorting large collections of line segments in a spatial database is investigated analytically using a random lines image model and geometric probability techniques. The methods are based on sorting the line segments with respect to the space that they occupy. Since the space is two-dimensional, the trie is formed by interleaving the bits corresponding to the binary representation of the x and y coordinates of the underlying space and then testing two bits at each iteration. The result of this formulation yields a class of representations that are referred to as quadtrie variants, although they have been traditionally referred to as quadtree variants. The analysis differs from prior work in that it uses a detailed explicit model of the image instead of relying on modeling the branching process represented by the tree and leaving the underlying image unspecified. The analysis provides analytic expressions and bounds on the expected size of these quadtree variants. This enables the prediction of storage required by the representations and of the associated performance of algorithms that rely on them. The results are useful in the following two ways: They reveal the properties of the various representations and permit their comparison using analytic, nonexperimental criteria. Some of the results confirm previous analyses (e.g., that the storage requirement of the MX quadtree is proportional to the total lengths of the line segments). An important new result is that for a PMR and Bucket PMR quadtree with sufficiently high values of the splitting threshold (i.e., $\geq 4$) the number of nodes is proportional to the number of line segments and is independent of the maximum depth of the tree. This provides a theoretical justification for the good behavior and use of the PMR quadtree, which so far has been only of an empirical nature. The random lines model was found to be general enough to approximate real data in the sense that the properties of the trie representations, when used to store real data (e.g., maps), are similar to their properties when storing random lines data. Therefore, by specifying an equivalent random lines model for a real map, the proposed analytical expressions can be applied to predict the storage required for real data. Specifying the equivalent random lines model requires only an estimate of the effective number of random lines in it. Several such estimates are derived for real images, and the accuracy of the implied predictions is demonstrated on a real collection of maps. The agreement between the predictions and real data suggests that they could serve as the basis of a cost model that can be used by a query optimizer to generate an appropriate query evaluation plan.