MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Learning to rank: from pairwise approach to listwise approach
Proceedings of the 24th international conference on Machine learning
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Parallel learning to rank for information retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
As ever-larger training sets for learning to rank are created, scalability of learning has become increasingly important to achieving continuing improvements in ranking accuracy. Exploiting independence of "summation form" computations, we show how each iteration in ListNet gradient descent can benefit from parallel execution. We seek to draw the attention of the IR community to use Spark, a newly introduced distributed cluster computing system, for reducing training time of iterative learning to rank algorithms. Unlike MapReduce, Spark is especially suited for iterative and interactive algorithms. Our results show near linear reduction in ListNet training time using Spark on Amazon EC2 clusters.