Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting unstructured data from template generated web documents
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Page-level template detection via isotonic smoothing
Proceedings of the 16th international conference on World Wide Web
Hi-index | 0.00 |
Most template detection methods process web pages in batches that a newly crawled page can not be processed until enough pages have been collected. This results in large storage consumption and a huge delay of data refreshing. In this paper, we present an incremental framework to detect templates in which a page is processed as soon as it has been crawled. In this framework, we don't need to cache any web page. Experiments show that our framework consumes less than 7% storage than traditional methods. And also the speed of data refreshing is accelerated because of the incremental manner.