Skip lists: a probabilistic alternative to balanced trees
Communications of the ACM
Introduction to algorithms
A survey of adaptive sorting algorithms
ACM Computing Surveys (CSUR)
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Adaptive set intersections, unions, and differences
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Interpolation search—a log logN search
Communications of the ACM
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures
IEEE Parallel & Distributed Technology: Systems & Technology
Experiments on Adaptive Set Intersections for Text Retrieval Systems
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Inverted file compression through document identifier reassignment
Information Processing and Management: an International Journal
Index Compression through Document Reordering
DCC '02 Proceedings of the Data Compression Conference
Assigning document identifiers to enhance compressibility of Web Search Engines indexes
Proceedings of the 2004 ACM symposium on Applied computing
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Introduction to Information Retrieval
Introduction to Information Retrieval
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
Using graphics processors for high performance IR query processing
Proceedings of the 18th international conference on World wide web
On efficient posting list intersection with multicore processors
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Efficient stream compaction on wide SIMD many-core architectures
Proceedings of the Conference on High Performance Graphics 2009
Improving the performance of list intersection
Proceedings of the VLDB Endowment
Faster adaptive set intersections for text searching
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Experimental analysis of a fast intersection algorithm for sorted sequences
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems
Proceedings of the VLDB Endowment
osmfind: fast textual search on OSM data -- on smartphones and servers
Proceedings of the Second ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems
The Yin and Yang of processing data warehousing queries on GPU devices
Proceedings of the VLDB Endowment
A study on parallelizing XML path filtering using accelerators
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search engines -- lists intersection and index compression. For lists intersection, we develop techniques for efficient implementation of the binary search algorithm for parallel computation. We inspect some representative real-world datasets and find that a sufficiently long inverted list has an overall linear rate of increase. Based on this observation, we propose Linear Regression and Hash Segmentation techniques for contracting the search range. For index compression, the traditional d-gap based compression schemata are not well-suited for parallel computation, so we propose a Linear Regression Compression schema which has an inherent parallel structure. We further discuss how to efficiently intersect the compressed lists on a GPU. Our experimental results show significant improvements in the query processing throughput on several datasets.