Maximal vector computation in large data sets

  • Authors:
  • Parke Godfrey;Ryan Shipley;Jarek Gryz

  • Affiliations:
  • York University, Toronto, ON, Canada;The College of William and Mary, Williamsburg, VA;York University, Toronto, ON, Canada

  • Venue:
  • VLDB '05 Proceedings of the 31st international conference on Very large data bases
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding the maximals in a collection of vectors is relevant to many applications. The maximal set is related to the convex hull---and hence, linear optimization---and nearest neighbors. The maximal vector problem has resurfaced with the advent of skyline queries for relational databases and skyline algorithms that are external and relationally well behaved.The initial algorithms proposed for maximals are based on divide-and-conquer. These established good average and worst case asymptotic running times, showing it to be O(n) average-case. where n is the number of vectors. However, they are not amenable to externalizing. We prove, furthermore, that their performance is quite bad with respect to the dimensionality, k, of the problem. We demonstrate that the more recent external skyline algorithms are actually better behaved, although they do not have as good an apparent asymptotic complexity. We introduce a new external algorithm, LESS, that combines the best features of these. experimentally evaluate its effectiveness and improvement over the field, and prove its average-case running time is O(kn).