PIES: a web information extraction system using ontology and tag patterns

  • Authors:
  • Byung-Kwon Park;Hyoil Han;Il-Yeol Song

  • Affiliations:
  • Dong-A University, Busan, Korea;Drexel University, Philadelphia, PA;Drexel University, Philadelphia, PA

  • Venue:
  • WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

We propose a new web information extraction system, PIES, to convert web information into XML documents. PIES uses a user-specified ontology and HTML tag pattern descriptions. The ontology validates the web information the pattern descriptions extract. We designed a new language to describe HTML tag patterns and extraction rules. We implemented PIES and applied it to the US patent web site for evaluation.