Training efficient tree-based models for document ranking

  • Authors:
  • Nima Asadi;Jimmy Lin

  • Affiliations:
  • Dept. of Computer Science, University of Maryland, College Park and Institute for Advanced Computer Studies, University of Maryland, College Park;Dept. of Computer Science, University of Maryland, College Park and Institute for Advanced Computer Studies, University of Maryland, College Park and The iSchool, University of Maryland, College P ...

  • Venue:
  • ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gradient-boosted regression trees (GBRTs) have proven to be an effective solution to the learning-to-rank problem. This work proposes and evaluates techniques for training GBRTs that have efficient runtime characteristics. Our approach is based on the simple idea that compact, shallow, and balanced trees yield faster predictions: thus, it makes sense to incorporate some notion of execution cost during training to "encourage" trees with these topological characteristics. We propose two strategies for accomplishing this: the first, by directly modifying the node splitting criterion during tree induction, and the second, by stagewise tree pruning. Experiments on a standard learning-to-rank dataset show that the pruning approach is superior; one balanced setting yields an approximately 40% decrease in prediction latency with minimal reduction in output quality as measured by NDCG.