Flexible and efficient IR using array databases

Authors:
Roberto Cornacchia;Sándor Héman;Marcin Zukowski;Arjen P. Vries;Peter Boncz
Affiliations:
CWI, Amsterdam, The Netherlands 1098 SJ;CWI, Amsterdam, The Netherlands 1098 SJ;CWI, Amsterdam, The Netherlands 1098 SJ;CWI, Amsterdam, The Netherlands 1098 SJ;CWI, Amsterdam, The Netherlands 1098 SJ
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2008

Citing 0
Cited 9

An architecture for recycling intermediates in a column-store

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Database architecture evolution: mammals flourished long before dinosaurs became extinct

Proceedings of the VLDB Endowment
The Data Cyclotron query processing scheme

Proceedings of the 13th International Conference on Extending Database Technology
An architecture for recycling intermediates in a column-store

ACM Transactions on Database Systems (TODS)
Search by strategy

ESAIR '10 Proceedings of the third workshop on Exploiting semantic annotations in information retrieval
SciQL, a query language for science applications

Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases
SciQL: bridging the gap between science and relational DBMS

Proceedings of the 15th Symposium on International Database Engineering & Applications
A candidate filtering mechanism for fast top-k query processing on modern cpus

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Astronomical data processing in EXTASCID

Proceedings of the 25th International Conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Matrix Framework is a recent proposal by Information Retrieval (IR) researchers to flexibly represent information retrieval models and concepts in a single multi-dimensional array framework. We provide computational support for exactly this framework with the array database system SRAM (Sparse Relational Array Mapping), that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules. To demonstrate their effect on text retrieval, we apply them in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage.