Sift: an end-user tool for gathering web content on the go

  • Authors:
  • Matthias Geel;Timothy Church;Moira C. Norrie

  • Affiliations:
  • ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland

  • Venue:
  • Proceedings of the 2012 ACM symposium on Document engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although web sites have started to embed semantic metadata within their documents, it remains a challenge for non-technical end-users to exploit that markup to extract and store information of interest. To address this challenge, we show how tools can be developed that allow users to identify extractable information while browsing and then control how that information should be extracted and stored in a personal library. The proposed approach is based on an extensible framework capable of using different kinds of markup to aid the extraction process and a unique fusion of several well-established techniques from areas such as the semantic web, data warehousing, web scraping and web feeds. We present the Sift tool which is a proof-of-concept implementation of the approach.