Fast construction of the HYB index

Authors:
Hannah Bast;Marjan Celikik
Affiliations:
Albert Ludwigs University, Albert Ludwigs University;Albert Ludwigs University, Albert Ludwigs University
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2011

Citing 23
Cited 1

In situ generation of compressed inverted files

Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Inverted files versus signature files for text indexing

ACM Transactions on Database Systems (TODS)
On two-dimensional indexability and optimal range search indexing

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Suffix arrays: a new method for on-line string searches

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Signature files: an access method for documents and its analytical performance evaluation

ACM Transactions on Information Systems (TOIS)
In-memory hash tables for accumulating text vocabularies

Information Processing Letters
Performance of data structures for small sets of strings

ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Introduction to Parallel Computing

Introduction to Parallel Computing
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Two-dimensional substring indexing

Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient single-pass index construction for text databases

Journal of the American Society for Information Science and Technology
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Indexing compressed text

Journal of the ACM (JACM)
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Type less, find more: fast autocompletion search with a succinct index

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
ESTER: efficient search on text, entities, and relations

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hybrid index maintenance for contiguous inverted lists

Information Retrieval
Efficient online index construction for text databases

ACM Transactions on Database Systems (TODS)
Fast error-tolerant search on very large texts

Proceedings of the 2009 ACM symposium on Applied Computing
Fast Single-Pass Construction of a Half-Inverted Index

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Efficient two-sided error-tolerant search

Proceedings of the 2nd International Workshop on Keyword Search on Structured Data

Recent and robust query auto-completion

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

As shown in a series of recent works, the HYB index is an alternative to the inverted index (INV) that enables very fast prefix searches, which in turn is the basis for fast processing of many other types of advanced queries, including autocompletion, faceted search, error-tolerant search, database-style select and join, and semantic search. In this work we show that HYB can be constructed at least as fast as INV, and often up to twice as fast. This is because HYB, by its nature, requires only a half-inversion of the data and allows an efficient in-place instead of the traditional merge-based index construction. We also pay particular attention to the cache efficiency of the in-memory posting accumulation, an issue that has not been addressed in previous work, and show that our simple multilevel posting accumulation scheme yields much fewer cache misses compared to related approaches. Finally, we show that HYB supports fast dynamic index updates more easily than INV.