Efficient parallel lists intersection and index compression algorithms using graphics processing units

Authors:
Naiyong Ao;Fan Zhang;Di Wu;Douglas S. Stones;Gang Wang;Xiaoguang Liu;Jing Liu;Sheng Lin
Affiliations:
Nankai University;Nankai University;Nankai University;Monash University;Nankai University;Nankai University;Nankai University;Nankai University
Venue:
Proceedings of the VLDB Endowment
Year:
2011

Citing 25
Cited 4

Skip lists: a probabilistic alternative to balanced trees

Communications of the ACM
Introduction to algorithms

Introduction to algorithms
A survey of adaptive sorting algorithms

ACM Computing Surveys (CSUR)
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Adaptive set intersections, unions, and differences

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Interpolation search—a log logN search

Communications of the ACM
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures

IEEE Parallel & Distributed Technology: Systems & Technology
Experiments on Adaptive Set Intersections for Text Retrieval Systems

ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Inverted file compression through document identifier reassignment

Information Processing and Management: an International Journal
Index Compression through Document Reordering

DCC '02 Proceedings of the Data Compression Conference
Assigning document identifiers to enhance compressibility of Web Search Engines indexes

Proceedings of the 2004 ACM symposium on Applied computing
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Introduction to Information Retrieval

Introduction to Information Retrieval
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
Using graphics processors for high performance IR query processing

Proceedings of the 18th international conference on World wide web
On efficient posting list intersection with multicore processors

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Efficient stream compaction on wide SIMD many-core architectures

Proceedings of the Conference on High Performance Graphics 2009
Improving the performance of list intersection

Proceedings of the VLDB Endowment
Faster adaptive set intersections for text searching

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Experimental analysis of a fast intersection algorithm for sorted sequences

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems

Proceedings of the VLDB Endowment
osmfind: fast textual search on OSM data -- on smartphones and servers

Proceedings of the Second ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems
The Yin and Yang of processing data warehousing queries on GPU devices

Proceedings of the VLDB Endowment
A study on parallelizing XML path filtering using accelerators

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search engines -- lists intersection and index compression. For lists intersection, we develop techniques for efficient implementation of the binary search algorithm for parallel computation. We inspect some representative real-world datasets and find that a sufficiently long inverted list has an overall linear rate of increase. Based on this observation, we propose Linear Regression and Hash Segmentation techniques for contracting the search range. For index compression, the traditional d-gap based compression schemata are not well-suited for parallel computation, so we propose a Linear Regression Compression schema which has an inherent parallel structure. We further discuss how to efficiently intersect the compressed lists on a GPU. Our experimental results show significant improvements in the query processing throughput on several datasets.