Towards a method for unsupervised web information extraction

  • Authors:
  • Hassan A. Sleiman;Rafael Corchuelo

  • Affiliations:
  • ETSI Informática, Universidad de Sevilla, Sevilla, Spain;ETSI Informática, Universidad de Sevilla, Sevilla, Spain

  • Venue:
  • ICWE'12 Proceedings of the 12th international conference on Web Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The literature provides a variety of techniques to build the information extractors on which some data integration systems rely. Information extraction techniques are usually based on extraction rules that require maintenance and adaptation if web sources change. We present our preliminary steps towards an unsupervised information extraction technique that searches web documents for shared patterns and fragments them until finding the relevant information that should be extracted. Experimental results on 1230 real-web documents demonstrate that our system performs fast and achieves promising results.