Bit-sliced index arithmetic

Authors:
Denis Rinfret;Patrick O'Neil;Elizabeth O'Neil
Affiliations:
UMass/Boston, Dept. of CS, UMass/Boston, Boston, MA;UMass/Boston & Microsoft Research, Dept. of CS, UMass/Boston, Boston, MA;-
Venue:
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Year:
2001

Citing 16
Cited 9

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Digital design (2nd ed.)

Digital design (2nd ed.)
The MG retrieval system: compressing for space and speed

Communications of the ACM
A critique of ANSI SQL isolation levels

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bitmap index design and evaluation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Exploring the similarity space

ACM SIGIR Forum
An efficient bitmap encoding scheme for selection queries

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Query optimization for selections using bitmaps

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Efficient passage ranking for document databases

ACM Transactions on Information Systems (TOIS)
Database (2nd ed.): principles, programming, and performance

Database (2nd ed.): principles, programming, and performance
Encoded Bitmap Indexing for Data Warehouses

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Model 204 Architecture and Performance

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
Clustering in massive data sets

Handbook of massive data sets

Bitmap-Based Indexing for Multi-dimensional Multimedia XML Documents

ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Indexing for progressive skyline computation

Data & Knowledge Engineering
RLH: bitmap compression technique based on run-length and huffman encoding

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Answering preference queries with bit-sliced index arithmetic

Proceedings of the 2008 C3S2E conference
RLH: Bitmap compression technique based on run-length and Huffman encoding

Information Systems
Position list word aligned hybrid: optimizing space and performance for compressed bitmaps

Proceedings of the 13th International Conference on Extending Database Technology
Modern B-Tree Techniques

Foundations and Trends in Databases
On contextual ranking queries in databases

Information Systems
BitWeaving: fast scans for main memory data processing

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

The bit-sliced index (BSI) was originally defined in [ONQ97]. The current paper introduces the concept of BSI arithmetic. For any two BSI's X and Y on a table T, we show how to efficiently generate new BSI's Z, V, and W, such that Z = X + Y, V = X - Y, and W = MIN(X, Y); this means that if a row r in T has a value x represented in BSI X and a value y in BSI Y, the value for r in BSI Z will be x + y, the value in V will be x - y and the value in W will be MIN(x, y). Since a bitmap representing a set of rows is the simplest bit-sliced index, BSI arithmetic is the most straightforward way to determine multisets of rows (with duplicates) resulting from the SQL clauses UNION ALL (addition), EXCEPT ALL (subtraction), and INTERSECT ALL (min) (see [OO00, DB2SQL] for definitions of these clauses). Another contribution of the current paper is to generalize BSI range restrictions from [ONQ97] to a new non-Boolean form: to determine the top k BSI-valued rows, for ally meaningful value k between one and the total number of rows in T. Together with bit-sliced addition, this permits us to solve a common basic problem of text retrieval: given an object-relational table T of rows representing documents, with a collection type column K representing keyword terms, we demonstrate an efficient algorithm to find k documents that share the largest number of terms with some query list Q of terms. A great deal of published work on such problems exists in the Information Retrieval (IR) field. The algorithm we introduce, which we call Bit-Sliced Term-Matching, or BSTM, uses an approach comparable in performance to the most efficient known IR algorithm, a major improvement on current DBMS text searching algorithms, with the advantage that it uses only indexing we propose for native database operations.