Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Information Systems - Special issue on semistructured data
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A brief survey of web data extraction tools
ACM SIGMOD Record
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Looking at the Web through XML Glasses
COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Reverse Engineering for Web Data: From Visual to Semantic Structures
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Hi-index | 0.01 |
We propose a new web information extraction system, PIES, to convert web information into XML documents. PIES uses a user-specified ontology and HTML tag pattern descriptions. The ontology validates the web information the pattern descriptions extract. We designed a new language to describe HTML tag patterns and extraction rules. We implemented PIES and applied it to the US patent web site for evaluation.