Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Digital design (2nd ed.)
The MG retrieval system: compressing for space and speed
Communications of the ACM
A critique of ANSI SQL isolation levels
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Improved query performance with variant indexes
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bitmap index design and evaluation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Exploring the similarity space
ACM SIGIR Forum
An efficient bitmap encoding scheme for selection queries
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Query optimization for selections using bitmaps
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Efficient passage ranking for document databases
ACM Transactions on Information Systems (TOIS)
Database (2nd ed.): principles, programming, and performance
Database (2nd ed.): principles, programming, and performance
Encoded Bitmap Indexing for Data Warehouses
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Model 204 Architecture and Performance
Proceedings of the 2nd International Workshop on High Performance Transaction Systems
Clustering in massive data sets
Handbook of massive data sets
Bitmap-Based Indexing for Multi-dimensional Multimedia XML Documents
ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Indexing for progressive skyline computation
Data & Knowledge Engineering
RLH: bitmap compression technique based on run-length and huffman encoding
Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Answering preference queries with bit-sliced index arithmetic
Proceedings of the 2008 C3S2E conference
RLH: Bitmap compression technique based on run-length and Huffman encoding
Information Systems
Position list word aligned hybrid: optimizing space and performance for compressed bitmaps
Proceedings of the 13th International Conference on Extending Database Technology
Foundations and Trends in Databases
On contextual ranking queries in databases
Information Systems
BitWeaving: fast scans for main memory data processing
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
The bit-sliced index (BSI) was originally defined in [ONQ97]. The current paper introduces the concept of BSI arithmetic. For any two BSI's X and Y on a table T, we show how to efficiently generate new BSI's Z, V, and W, such that Z = X + Y, V = X - Y, and W = MIN(X, Y); this means that if a row r in T has a value x represented in BSI X and a value y in BSI Y, the value for r in BSI Z will be x + y, the value in V will be x - y and the value in W will be MIN(x, y). Since a bitmap representing a set of rows is the simplest bit-sliced index, BSI arithmetic is the most straightforward way to determine multisets of rows (with duplicates) resulting from the SQL clauses UNION ALL (addition), EXCEPT ALL (subtraction), and INTERSECT ALL (min) (see [OO00, DB2SQL] for definitions of these clauses). Another contribution of the current paper is to generalize BSI range restrictions from [ONQ97] to a new non-Boolean form: to determine the top k BSI-valued rows, for ally meaningful value k between one and the total number of rows in T. Together with bit-sliced addition, this permits us to solve a common basic problem of text retrieval: given an object-relational table T of rows representing documents, with a collection type column K representing keyword terms, we demonstrate an efficient algorithm to find k documents that share the largest number of terms with some query list Q of terms. A great deal of published work on such problems exists in the Information Retrieval (IR) field. The algorithm we introduce, which we call Bit-Sliced Term-Matching, or BSTM, uses an approach comparable in performance to the most efficient known IR algorithm, a major improvement on current DBMS text searching algorithms, with the advantage that it uses only indexing we propose for native database operations.