Demonstrating intelligent crawling and archiving of web applications

Authors:
Muhammad Faheem;Pierre Senellart
Affiliations:
Institut Mines-Télécom ParisTech, Paris, France;Télécom ParisTech & The University of Hong Kong, Hong Kong, Paris, France
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 5
Cited 0

Path sharing and predicate evaluation for high-performance XML filtering

ACM Transactions on Database Systems (TODS)
The volume and evolution of web page templates

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Reinventing Discovery: The New Era of Networked Science

Reinventing Discovery: The New Era of Networked Science
H2RDF: adaptive query processing on RDF data in the cloud.

Proceedings of the 21st international conference companion on World Wide Web
Intelligent and adaptive crawling of web applications for web archiving

ICWE'13 Proceedings of the 13th international conference on Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We demonstrate here a new approach to Web archival crawling, based on an application-aware helper that drives crawls of Web applications according to their types (especially, according to their content management systems). By adapting the crawling strategy to the Web application type, one is able to crawl a given Web application (say, a given forum or blog) with fewer requests than traditional crawling techniques. Additionally, the application-aware helper is able to extract semantic content from the Web pages crawled, which results in a Web archive of richer value to an archive user. In our demonstration scenario, we invite a user to compare application-aware crawling to regular Web crawling on the Web site of their choice, both in terms of efficiency and of experience in browsing and searching the archive.