Internet scrapbook: automating Web browsing tasks by demonstration
Proceedings of the 11th annual ACM symposium on User interface software and technology
Annotea: an open RDF infrastructure for shared Web annotations
Proceedings of the 10th international conference on World Wide Web
Interaction design for Web-based, within-page collection making and management
Proceedings of the 12th ACM conference on Hypertext and Hypermedia
Hunter gatherer: interaction support for the creation and management of within-web-page collections
Proceedings of the 11th international conference on World Wide Web
Authoring and annotation of web pages in CREAM
Proceedings of the 11th international conference on World Wide Web
A brief survey of web data extraction tools
ACM SIGMOD Record
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Thresher: automating the unwrapping of semantic content from the World Wide Web
WWW '05 Proceedings of the 14th international conference on World Wide Web
MyPortal: robust extraction and aggregation of web content
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Summarizing personal web browsing sessions
UIST '06 Proceedings of the 19th annual ACM symposium on User interface software and technology
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Piggy Bank: Experience the Semantic Web inside your web browser
Web Semantics: Science, Services and Agents on the World Wide Web
Relations, cards, and search templates: user-guided web data integration and layout
Proceedings of the 20th annual ACM symposium on User interface software and technology
Building data warehouses with semantic data
Proceedings of the 2010 EDBT/ICDT Workshops
Integrating web feed opinions into a corporate data warehouse
Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
Hi-index | 0.00 |
Although web sites have started to embed semantic metadata within their documents, it remains a challenge for non-technical end-users to exploit that markup to extract and store information of interest. To address this challenge, we show how tools can be developed that allow users to identify extractable information while browsing and then control how that information should be extracted and stored in a personal library. The proposed approach is based on an extensible framework capable of using different kinds of markup to aid the extraction process and a unique fusion of several well-established techniques from areas such as the semantic web, data warehousing, web scraping and web feeds. We present the Sift tool which is a proof-of-concept implementation of the approach.