Parallelizing ListNet training using spark

Authors:
Shilpa Shukla;Matthew Lease;Ambuj Tewari
Affiliations:
University of Texas at Austin, Austin, USA;University of Texas at Austin, Austin, USA;University of Texas at Austin, Austin, USA
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 5
Cited 0

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Learning to rank: from pairwise approach to listwise approach

Proceedings of the 24th international conference on Machine learning
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Parallel learning to rank for information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

As ever-larger training sets for learning to rank are created, scalability of learning has become increasingly important to achieving continuing improvements in ranking accuracy. Exploiting independence of "summation form" computations, we show how each iteration in ListNet gradient descent can benefit from parallel execution. We seek to draw the attention of the IR community to use Spark, a newly introduced distributed cluster computing system, for reducing training time of iterative learning to rank algorithms. Unlike MapReduce, Spark is especially suited for iterative and interactive algorithms. Our results show near linear reduction in ListNet training time using Spark on Amazon EC2 clusters.