Which sort orders are interesting?

  • Authors:
  • Ravindra Guravannavar;S. Sudarshan;Ajit A. Diwan;Ch. Sobhan Babu

  • Affiliations:
  • Department of Computer Science and Engineering, Indian Institute of Technology, Hyderabad, India;Department of Computer Science and Engineering, Indian Institute of Technology, Bombay, Mumbai, India;Department of Computer Science and Engineering, Indian Institute of Technology, Bombay, Mumbai, India;Department of Computer Science and Engineering, Indian Institute of Technology, Hyderabad, India

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sort orders play an important role in query evaluation. Algorithms that rely on sorting are widely used to implement joins, grouping, duplicate elimination and other set operations. The notion of interesting orders has allowed query optimizers to consider plans that could be locally sub-optimal, but produce ordered output beneficial for other operators, and thus be part of a globally optimal plan. However, the number of interesting orders for most operators is factorial in the number of attributes involved. Optimizer implementations use heuristics to prune the number of interesting orders, but the quality of the heuristics is unclear. Increasingly complex decision support queries and increasing use of query-covering indices, which provide multiple alternative sort orders for relations, motivate us to better address the problem of choosing interesting orders. We show that even a simplified version of the problem is NP-hard and provide a 1/2-benefit approximation algorithm for a special case of the problem. We then present principled heuristics for the general case of choosing interesting orders. We have implemented the proposed techniques in a Volcano-style cost-based optimizer, and our performance study shows significant improvements in estimated cost. We also executed our plans on a widely used commercial database system, and on PostgreSQL, and found that actual execution times for our plans were significantly better than for plans generated by those systems in several cases.