Parallelizing ListNet training using spark

  • Authors:
  • Shilpa Shukla;Matthew Lease;Ambuj Tewari

  • Affiliations:
  • University of Texas at Austin, Austin, USA;University of Texas at Austin, Austin, USA;University of Texas at Austin, Austin, USA

  • Venue:
  • SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As ever-larger training sets for learning to rank are created, scalability of learning has become increasingly important to achieving continuing improvements in ranking accuracy. Exploiting independence of "summation form" computations, we show how each iteration in ListNet gradient descent can benefit from parallel execution. We seek to draw the attention of the IR community to use Spark, a newly introduced distributed cluster computing system, for reducing training time of iterative learning to rank algorithms. Unlike MapReduce, Spark is especially suited for iterative and interactive algorithms. Our results show near linear reduction in ListNet training time using Spark on Amazon EC2 clusters.