An FPGA-based accelerator for LambdaRank in Web search engines

Authors:
Jing Yan;Ning-YI Xu;Xiong-FEI Cai;Rui Gao;Yu Wang;Rong Luo;Feng-HSIUNG Hsu
Affiliations:
Microsoft Research Asia and Tsinghua University, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China;Tsinghua University, Beijing, P.R. China;Tsinghua University, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China
Venue:
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Year:
2011

Citing 10
Cited 0

Using and designing massively parallel computers for artificial neural networks

Journal of Parallel and Distributed Computing - Special issue on neural computing on massively parallel processing
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Programmable Stream Processors

Computer
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
FPGA Implementations of Neural Networks

FPGA Implementations of Neural Networks
Learning to rank for information retrieval (LR4IR 2007)

ACM SIGIR Forum
FPGA Acceleration of RankBoost in Web Search Engines

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
The Impact of Arithmetic Representation on Implementing MLP-BP on FPGAs: A Study

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In modern Web search engines, Neural Network (NN)-based learning to rank algorithms is intensively used to increase the quality of search results. LambdaRank is one such algorithm. However, it is hard to be efficiently accelerated by computer clusters or GPUs, because: (i) the cost function for the ranking problem is much more complex than that of traditional Back-Propagation(BP) NNs, and (ii) no coarse-grained parallelism exists in the algorithm. This article presents an FPGA-based accelerator solution to provide high computing performance with low power consumption. A compact deep pipeline is proposed to handle the complex computing in the batch updating. The area scales linearly with the number of hidden nodes in the algorithm. We also carefully design a data format to enable streaming consumption of the training data from the host computer. The accelerator shows up to 15.3X (with PCIe x4) and 23.9X (with PCIe x8) speedup compared with the pure software implementation on datasets from a commercial search engine.