Rate-distortion approach to databases: storage and content-based retrieval

  • Authors:
  • E. Tuncel;P. Koulgi;K. Rose

  • Affiliations:
  • Dept. of Electr. Eng., Univ. of California, Riverside, CA, USA;-;-

  • Venue:
  • IEEE Transactions on Information Theory
  • Year:
  • 2004

Quantified Score

Hi-index 754.90

Visualization

Abstract

This paper investigates the relationship between rate-distortion theory and efficient content-based data retrieval from high-dimensional databases. We consider database design as the encoding of a data object sequence, and retrieval from the database as the decoding of the sequence using side information (i.e., the query) available only at the decoder. We show that, in this setting, the optimal asymptotic tradeoff between the search time Rs (bits per data object read from the storage device) and the expected search accuracy Ds (relevance of the retrieved data set) is given by the Wyner-Ziv solution with a side-information-dependent distortion measure. Moreover, the data indexing and retrieval problem is, in general, inseparable from the data compression problem. Data items selected by the search procedure, which can be stored in the disk with a limited total rate of Rr ≥ Rs, need to be presented at a prescribed expected reconstruction quality Dr. This is, hence, a problem of scalable source coding or successive refinement, albeit with differing layer distortion measures to quantify search and reconstruction quality, respectively. We derive a single-letter characterization of all achievable quadruples {Rs,Rr,Ds,Dr}, and prove conditions for "successive refinability" without rate loss. Finally, we show that the special case Ds=Dr=0 is nontrivial and of practical interest in this context, as it can impose "acceptable" search and reconstruction qualities for each individual data item and for the entire query space with high probability, in contradistinction with standard average distortion requirements. The region of achievable {Rs,Rr} is obtained by adapting Rimoldi's characterization to a new regular scalable coding problem.