SCOOP: A Record Extractor without Knowledge on Input

Authors:
Yasuhiro Yamada;Daisuke Ikeda;Sachio Hirokawa
Affiliations:
-;-;-
Venue:
DS '01 Proceedings of the 4th International Conference on Discovery Science
Year:
2001

Citing 5
Cited 2

Cut and paste

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Record-boundary discovery in Web documents

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
Extracting Partial Structures from HTML Documents

Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference
Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts

DS '01 Proceedings of the 4th International Conference on Discovery Science

Automatic Wrapper Generation for Multilingual Web Resources

DS '02 Proceedings of the 5th International Conference on Discovery Science
Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts

DS '01 Proceedings of the 4th International Conference on Discovery Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a record extractor system SCOOP. We assume that semi-structured documents given to SCOOP contain similar formats and each of them has only a record consisting of some different fields. SCOOP treats a document as just a string and does not use knowledge on input except that a field is surrounded with delimiters, a left delimiter ends with "", and the corresponding right delimiter begins with "