SCOOP: A Record Extractor without Knowledge on Input

  • Authors:
  • Yasuhiro Yamada;Daisuke Ikeda;Sachio Hirokawa

  • Affiliations:
  • -;-;-

  • Venue:
  • DS '01 Proceedings of the 4th International Conference on Discovery Science
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a record extractor system SCOOP. We assume that semi-structured documents given to SCOOP contain similar formats and each of them has only a record consisting of some different fields. SCOOP treats a document as just a string and does not use knowledge on input except that a field is surrounded with delimiters, a left delimiter ends with "", and the corresponding right delimiter begins with "