Path sharing and predicate evaluation for high-performance XML filtering
ACM Transactions on Database Systems (TODS)
The volume and evolution of web page templates
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Reinventing Discovery: The New Era of Networked Science
Reinventing Discovery: The New Era of Networked Science
H2RDF: adaptive query processing on RDF data in the cloud.
Proceedings of the 21st international conference companion on World Wide Web
Intelligent and adaptive crawling of web applications for web archiving
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Hi-index | 0.00 |
We demonstrate here a new approach to Web archival crawling, based on an application-aware helper that drives crawls of Web applications according to their types (especially, according to their content management systems). By adapting the crawling strategy to the Web application type, one is able to crawl a given Web application (say, a given forum or blog) with fewer requests than traditional crawling techniques. Additionally, the application-aware helper is able to extract semantic content from the Web pages crawled, which results in a Web archive of richer value to an archive user. In our demonstration scenario, we invite a user to compare application-aware crawling to regular Web crawling on the Web site of their choice, both in terms of efficiency and of experience in browsing and searching the archive.