In Search of the Lost Schema

  • Authors:
  • Stéphane Grumbach;Giansalvatore Mecca

  • Affiliations:
  • -;-

  • Venue:
  • ICDT '99 Proceedings of the 7th International Conference on Database Theory
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the problem of rediscovering the schema of nested relations that have been encoded as strings for storage purposes. We consider various classes of encoding functions, and consider the markup encodings, which allow to find the schema without knowledge of the encoding function, under reasonable assumptions on the input data. Depending upon the encoding of empty sets, we propose two polynomial on-line algorithms (with different buffer size) solving the schema finding problem. We also prove that with a high probability, both algorithms find the schema after examining a fixed number of tuples, thus leading in practice to a linear time behavior with respect to the database size for wrapping the data. Finally, we show that the proposed techniques are well-suited for practical applications, such as structuring and wrapping HTML pages and Web sites.