Sift: an end-user tool for gathering web content on the go

Authors:
Matthias Geel;Timothy Church;Moira C. Norrie
Affiliations:
ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland
Venue:
Proceedings of the 2012 ACM symposium on Document engineering
Year:
2012

Citing 16
Cited 0

Internet scrapbook: automating Web browsing tasks by demonstration

Proceedings of the 11th annual ACM symposium on User interface software and technology
Annotea: an open RDF infrastructure for shared Web annotations

Proceedings of the 10th international conference on World Wide Web
Interaction design for Web-based, within-page collection making and management

Proceedings of the 12th ACM conference on Hypertext and Hypermedia
Hunter gatherer: interaction support for the creation and management of within-web-page collections

Proceedings of the 11th international conference on World Wide Web
Authoring and annotation of web pages in CREAM

Proceedings of the 11th international conference on World Wide Web
A brief survey of web data extraction tools

ACM SIGMOD Record
Automatic web news extraction using tree edit distance

Proceedings of the 13th international conference on World Wide Web
Web data extraction based on partial tree alignment

WWW '05 Proceedings of the 14th international conference on World Wide Web
Thresher: automating the unwrapping of semantic content from the World Wide Web

WWW '05 Proceedings of the 14th international conference on World Wide Web
MyPortal: robust extraction and aggregation of web content

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Summarizing personal web browsing sessions

UIST '06 Proceedings of the 19th annual ACM symposium on User interface software and technology
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Piggy Bank: Experience the Semantic Web inside your web browser

Web Semantics: Science, Services and Agents on the World Wide Web
Relations, cards, and search templates: user-guided web data integration and layout

Proceedings of the 20th annual ACM symposium on User interface software and technology
Building data warehouses with semantic data

Proceedings of the 2010 EDBT/ICDT Workshops
Integrating web feed opinions into a corporate data warehouse

Proceedings of the 2nd International Workshop on Business intelligencE and the WEB

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although web sites have started to embed semantic metadata within their documents, it remains a challenge for non-technical end-users to exploit that markup to extract and store information of interest. To address this challenge, we show how tools can be developed that allow users to identify extractable information while browsing and then control how that information should be extracted and stored in a personal library. The proposed approach is based on an extensible framework capable of using different kinds of markup to aid the extraction process and a unique fusion of several well-established techniques from areas such as the semantic web, data warehousing, web scraping and web feeds. We present the Sift tool which is a proof-of-concept implementation of the approach.