A fast algorithm for constructing inverted files on heterogeneous platforms

Authors:
Zheng Wei;Joseph JaJa
Affiliations:
-;-
Venue:
Journal of Parallel and Distributed Computing
Year:
2012

Citing 6
Cited 0

In situ generation of compressed inverted files

Journal of the American Society for Information Science
Efficient distributed algorithms to build inverted files

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Building a distributed full-text index for the web

ACM Transactions on Information Systems (TOIS)
Burst tries: a fast, efficient data structure for string keys

ACM Transactions on Information Systems (TOIS)
Efficient single-pass index construction for text databases

Journal of the American Society for Information Science and Technology
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a collection of documents residing on a disk, we develop a new strategy for processing these documents and building the inverted files extremely quickly. Our approach is tailored for a heterogeneous platform consisting of multicore CPUs and highly multithreaded GPUs. Our algorithm is based on a number of novel techniques, including a high-throughput pipelined strategy, a hybrid trie and B-tree dictionary data structure, dynamic work allocation to CPU and GPU threads, and optimized CUDA indexer implementation. We have performed extensive tests of our algorithm on a single node (two Intel Xeon X5560 Quad-core CPUs) with two NVIDIA Tesla C1060 GPUs attached to it, and were able to achieve a throughput of more than 262 MB/s on the ClueWeb09 dataset. Similar results were obtained for widely different datasets. The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters.