Faster and smaller inverted indices with treaps

Authors:
Roberto Konow;Gonzalo Navarro;Charles L.A. Clarke;Alejandro López-Ortíz
Affiliations:
Univ Chile, Santiago, Chile;Univ. of Chile, Santiago, Chile;University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 34
Cited 2

Recursive star-tree parallel data structure

SIAM Journal on Computing
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Randomized binary search trees

Journal of the ACM (JACM)
Fast set operations using treaps

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Adaptive set intersections, unions, and differences

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A unifying look at data structures

Communications of the ACM
Information Retrieval: Computational and Theoretical Aspects

Information Retrieval: Computational and Theoretical Aspects
Adaptive intersection and t-threshold problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Searching large text collections

Handbook of massive data sets
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient document retrieval in main memory

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
An experimental investigation of set intersection algorithms for text searching

Journal of Experimental Algorithmics (JEA)
Compact set representation for information retrieval

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Modern Information Retrieval

Modern Information Retrieval
Information Retrieval: Implementing and Evaluating Search Engines

Information Retrieval: Implementing and Evaluating Search Engines
Fully-functional succinct trees

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Dual-sorted inverted lists

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
A cascade ranking model for efficient ranked retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays

SIAM Journal on Computing
Experimental analysis of a fast intersection algorithm for sorted sequences

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Rank-Sensitive data structures

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Differentially Encoded Search Trees

DCC '12 Proceedings of the 2012 Data Compression Conference
Wavelet trees for all

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
DACs: Bringing direct access to variable-length codes

Information Processing and Management: an International Journal
Dual-Sorted inverted lists in practice

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval

Exploring the magic of WAND

Proceedings of the 18th Australasian Document Computing Symposium
On the compression of search trees

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression we represent the treap topology using compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that our index uses about 20% less space, and performs queries up to three times faster, than state-of-the-art compact representations.