Flexible and efficient IR using array databases

  • Authors:
  • Roberto Cornacchia;Sándor Héman;Marcin Zukowski;Arjen P. Vries;Peter Boncz

  • Affiliations:
  • CWI, Amsterdam, The Netherlands 1098 SJ;CWI, Amsterdam, The Netherlands 1098 SJ;CWI, Amsterdam, The Netherlands 1098 SJ;CWI, Amsterdam, The Netherlands 1098 SJ;CWI, Amsterdam, The Netherlands 1098 SJ

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Matrix Framework is a recent proposal by Information Retrieval (IR) researchers to flexibly represent information retrieval models and concepts in a single multi-dimensional array framework. We provide computational support for exactly this framework with the array database system SRAM (Sparse Relational Array Mapping), that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules. To demonstrate their effect on text retrieval, we apply them in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage.