PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Extracting Partial Structures from HTML Documents
Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference
Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts
DS '01 Proceedings of the 4th International Conference on Discovery Science
Automatic Wrapper Generation for Multilingual Web Resources
DS '02 Proceedings of the 5th International Conference on Discovery Science
Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts
DS '01 Proceedings of the 4th International Conference on Discovery Science
Hi-index | 0.00 |
We present a record extractor system SCOOP. We assume that semi-structured documents given to SCOOP contain similar formats and each of them has only a record consisting of some different fields. SCOOP treats a document as just a string and does not use knowledge on input except that a field is surrounded with delimiters, a left delimiter ends with "", and the corresponding right delimiter begins with "